Distance scalable no touch computing

ABSTRACT

Disclosed herein are techniques for scaling and translating gestures such that the applicable gestures for control may vary depending on the user&#39;s distance from a gesture-based system. The techniques for scaling and translation may take the varying distances from which a user interacts with components of the gesture-based system, such as a computing environment or capture device, into consideration with respect to defining and/or recognizing gestures. In an example embodiment, the physical space is divided into virtual zones of interaction, and the system may scale or translate a gesture based on the zones. A set of gesture data may be associated with each virtual zone such that gestures appropriate for controlling aspects of the gesture-based system may vary throughout the physical space.

BACKGROUND

Many computing applications such as computer games, multimediaapplications, office applications or the like use controls to allowusers to manipulate characters or control other aspects of anapplication. Typically such controls are input using, for example,controllers, remotes, keyboards, mice, or the like. Unfortunately, suchcontrols can be difficult to learn, thus creating a barrier between auser and such applications. Furthermore, such controls may be differentthan actual actions for which the controls are used. For example, a gamecontrol that causes a game character to swing a baseball bat may be acombination of buttons and may not correspond to an actual motion ofswinging the baseball bat, or a control to reposition a view on ascreen, such as repositioning the view of a map in a map application,may be a selection of arrow buttons on a keyboard and may not correspondto the actual motion of the files.

SUMMARY

In a gesture-based system, gestures may control aspects of a computingenvironment or application, where the gestures may be derived from auser's position or movement in a physical space. To create asatisfactory user experience, it may be desirable that the gesturescorrespond to natural user positions or motions with respect to thedistance that the user interacts with the device. For example, a usermay interact with a cell phone or other mobile device at a very closedistance, but may interact with a television screen at a largerdistance. Disclosed herein are techniques for scaling and translatinggestures such that the applicable gestures for control may varydepending on the user's distance from the computing environment. Thetechniques for scaling and translation may take the varying distancesfrom which a user interacts with components of the gesture-based system,such as a computing environment or capture device, into considerationwith respect to defining and/or recognizing gestures. In an exampleembodiment, the physical space is divided into virtual zones ofinteraction, and the system may scale or translate a gesture based onthe zones.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing Summary, as well as the following Detailed Description ofillustrative embodiments, is better understood when read in conjunctionwith the appended drawings. For the purpose of illustrating theembodiments, there are shown in the drawings example constructions ofthe embodiments; however, the embodiments are not limited to thespecific methods and instrumentalities disclosed. In the drawings:

FIG. 1 illustrates an example embodiment of a target recognition,analysis, and tracking system with a user playing a game.

FIGS. 2A, 2B, and 2C illustrate example embodiments of a physical spaceand various components of a gesture-based system that may implementgesture identification techniques based on virtual zones.

FIG. 3A depicts an example flow diagram for identifying a user's gesturewith respect to virtual zones that each represent a portion of thephysical space.

FIG. 3B depicts an example flow diagram for associating a set of gesturedata for each virtual zone that represents a portion of the physicalspace.

FIG. 4 illustrates an example embodiment of a capture device and anexample computing environment that may be used in a target digitization,extraction, and tracking system.

FIG. 5A illustrates a skeletal mapping of a user that has been generatedfrom a target recognition, analysis, and tracking system such as thatshown in FIG. 4.

FIG. 5B illustrates further details of a gesture recognizer architecturesuch as that shown in FIG. 4.

FIG. 6 illustrates an example embodiment of a computing environment inwhich the techniques described herein may be embodied.

FIG. 7 illustrates another example embodiment of a computing environmentin which the techniques described herein may be embodied.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

Disclosed herein are techniques for gesture scaling and translation. Thesubject matter of the disclosed embodiments is described withspecificity to meet statutory requirements. However, the descriptionitself is not intended to limit the scope of this patent. Rather, theclaimed subject matter might also be embodied in other ways, to includeelements similar to the ones described in this document in conjunctionwith other present or future technologies.

Embodiments are related to techniques for gesture scaling andtranslation. A gesture may be derived from a user's position or motionin the physical space and may include any user motion, dynamic orstatic, such as running, moving a finger, or a static pose. According toan example embodiment, a capture device, such as a camera, may capturedata, such as image data, that is representative of the user'sgesture(s). A computer environment may be used to recognize and analyzethe gestures made by the user in the user's three-dimensional physicalspace such that the user's gestures may be interpreted to controlaspects of a system or application space. The computer environment maydisplay user feedback by mapping the user's gesture(s) to an avatar on ascreen.

A gesture-based system or application may have default gestureinformation for determining if a user is performing a particulargesture. For example, a system may have a gesture recognizer thatcompares captured data to a database of default gesture information suchas filters with default gesture parameters. The gesture recognizer maycompare data received by the capture device to the default gestureinformation and output a gesture. The output may include a confidencelevel that the output gesture was performed.

A gesture-based system may employ techniques for scaling and translatinggestures to accommodate the different distances from which a userinteracts with the system. Thus, based on the distance of the user fromthe computing environment or capture device, for example, differentscales of gesture inputs may be input for a given desired outcome of thegesture input. In an example embodiment, the physical space is dividedinto zones, and a set of gestures may be applicable for each zone. Eachzone may represent a region of the physical space that is definedaccording to a distance(s) from a capture device. Gestures within theset of gestures may be unique to the zone or may be common acrossseveral zones. Also, the system may perform efficient gesturerecognition such that, near the boundaries of a zone, the system canevaluate the user's gestures within the context of multiple zones.

The system, methods, techniques, and components of scaling andtranslating gestures may be embodied in a multi-media console, such as agaming console, or in any other computing environment in which it isdesired to display a visual representation of a target, including, byway of example and without any intended limitation, satellite receivers,set top boxes, arcade games, personal computers (PCs), portabletelephones, personal digital assistants (PDAs), and other hand-helddevices.

FIG. 1 illustrates an example embodiment of a configuration of a targetrecognition, analysis, and tracking gesture-based system 10 that mayemploy the disclosed techniques for gesture personalization and gestureprofile roaming. In the example embodiment, a user 18 is playing abowling game. In an example embodiment, the system 10 may recognize,analyze, and/or track a human target such as the user 18. The system 10may gather information related to the user's motions, facialexpressions, body language, emotions, etc., in the physical space. Forexample, the system may identify and scan the human target 18. Thesystem 10 may use body posture recognition techniques to identify thebody type of the human target 18. The system 10 may identify the bodyparts of the user 18 and how they move.

As shown in FIG. 1, the target recognition, analysis, and trackingsystem 10 may include a computing environment 212. The computingenvironment 212 may be a multimedia console, a personal computer (PC), acellular device, a gaming system or console, a handheld computingdevice, a PDA, a music player, a cloud computer, a capture device, orthe like. According to an example embodiment, the computing environment212 may include hardware components and/or software components such thatthe computing environment 212 may be used to execute applications. Anapplication may be any program that operates or is executed by thecomputing environment including both gaming and non-gaming applications,such as a word processor, spreadsheet, media player, databaseapplication, computer game, video game, chat, forum, community, instantmessaging, or the like.

As shown in FIG. 1, the target recognition, analysis, and trackingsystem 10 may include a capture device 202. The capture device 202 maybe, for example, a camera that may be used to visually monitor one ormore users, such as the user 18, such that gestures performed by the oneor more users may be captured, analyzed, and tracked to perform one ormore controls or actions within an application. In the exampleembodiment shown in FIG. 1, a virtual object is a bowling ball and theuser moves in the three-dimensional physical space as if actuallyhandling the bowling ball. The user's gestures in the physical space cancontrol the bowling ball displayed on the screen 14. In exampleembodiments, the human target such as the user 18 may actually have aphysical object. In such embodiments, the user of an electronic game maybe holding the object such that the motions of the player and the objectmay be used to adjust and/or control parameters of the game. Forexample, the motion of a player holding a racket may be tracked andutilized for controlling an on-screen racket in an electronic sportsgame. In another example embodiment, the motion of a player holding anobject may be tracked and utilized for controlling an on-screen weaponin an electronic combat game.

According to one embodiment, the target recognition, analysis, andtracking system 10 may be connected to an audiovisual device 16 such asa television, a monitor, a high-definition television (HDTV), or thelike that may provide game or application visuals and/or audio to a usersuch as the user 18. For example, the computing environment 212 mayinclude a video adapter such as a graphics card and/or an audio adaptersuch as a sound card that may provide audiovisual signals associatedwith the game application, non-game application, or the like. Theaudiovisual device 16 may receive the audiovisual signals from thecomputing environment 212 and may then output the game or applicationvisuals and/or audio associated with the audiovisual signals to the user18. According to one embodiment, the audiovisual device 16 may beconnected to the computing environment 212 via, for example, an S-Videocable, a coaxial cable, an HDMI cable, a DVI cable, a VGA cable, or thelike.

As used herein, a computing environment may refer to a single computingdevice or to a computing system. The computing environment may includenon-computing components. As used herein, a computing system, computingdevice, computing environment, computer, processor, or other computingcomponent may be used interchangeably. For example, the computingenvironment may comprise the entire target recognition, analysis, andtracking system 10 shown in FIG. 1. The computing environment mayinclude the audiovisual device 16 and/or the capture device 202. Eitheror both of the exemplary audiovisual device 16 or capture device 202 maybe an entity separate but coupled to the computing environment or may bepart of the computing device that processes and displays, for example.Thus, computing environment may be a standalone capture devicecomprising a processor that can process the captured data.

As shown in FIG. 1, the target recognition, analysis, and trackingsystem 10 may be used to recognize, analyze, and/or track a human targetsuch as the user 18. For example, the user 18 may be tracked using thecapture device 202 such that the gestures of user 18 may be interpretedas controls that may be used to affect the application being executed bycomputer environment 212. Thus, according to one embodiment, the user 18may move his or her body to control the application. The system 10 maytrack the user's body and the motions made by the user's body, includinggestures that control aspects of the system, such as the application,operating system, or the like.

The system 10 may translate an input to a capture device 202 into ananimation, the input being representative of a user's motion, such thatthe animation is driven by that input. Thus, the user's motions may mapto a visual representation, such as an avatar, such that the user'smotions in the physical space are emulated by the avatar. The rate thatframes of image data are captured and displayed may determine the levelof continuity of the displayed motion of the visual representation.

FIG. 1 depicts an example embodiment of an application executing on thecomputing environment 212 that may be a bowling game that the user 18may be playing. In this example, the computing environment 212 may usethe audiovisual device 16 to provide a visual representation of abowling alley and bowling lanes to the user 18. The computingenvironment 212 may also use the audiovisual device 16 to provide avisual representation of a player avatar 19 that the user 18 may controlwith his or her movements. The computer environment 212 and the capturedevice 202 of the target recognition, analysis, and tracking system 10may be used to recognize and analyze the gestures made by the user 18 inthe user's three-dimensional physical space such that the user'sgestures may be interpreted to control the player avatar 19 in gamespace. For example, as shown in FIG. 1, the user 18 may make a bowlingmotion in a physical space to cause the player avatar 19 to make abowling motion in the game space. Other movements by the user 18 mayalso be interpreted as controls or actions, such as controls to walk,select a ball, position the avatar on the bowling lane, swing the ball,etc.

Multiple users can interact with each other from remote locations. Thecomputing environment 212 may use the audiovisual device 16 to providethe visual representation of an avatar that another user may controlwith his or her movements. For example, the visual representation ofanother bowler displayed on the audiovisual device 16 may berepresentative of another user, such as a second user in the physicalspace with the user, or a networked user in a second physical space.Similarly, an avatar may be displayed in non-gaming applications, suchas a word processing or spreadsheet document. Avatars may be displayedthat represent respective users that are remote to each other.

Gestures may be used in a video-game-specific context such as thebowling game example shown in FIG. 1. In another game example such as adriving game, various motions of the hands and feet may correspond tosteering a vehicle in a direction, shifting gears, accelerating, andbreaking. The player's gestures may be interpreted as controls thatcorrespond to actions other than controlling the avatar 19, such asgestures used for input in a general computing context. For instance,various motions of the user's 18 hands or other body parts may to end,pause, or save a game, select a level, view high scores, communicatewith a friend, etc.

While FIG. 1 depicts the user in a video-game-specific context, it iscontemplated that the target recognition, analysis, and tracking system10 may interpret target movements for controlling aspects of anoperating system and/or application that are outside the realm of games.Virtually any controllable aspect of an operating system and/orapplication may be controlled by movements of the target such as theuser 18. For example, the user's gestures may correspond to commonsystem wide tasks such as navigate up or down in a hierarchical list,open a file, close a file, and save a file. The user's gesture may becontrols applicable to an operating system, non-gaming aspects of agame, or a non-gaming application. For example, the user's gestures maybe interpreted as object manipulation, such as controlling a userinterface. For example, consider a user interface having blades or atabbed interface lined up vertically left to right, where the selectionof each blade or tab opens up the options for various controls withinthe application or the system. The system may identify the user's handgesture for movement of a tab, where the user's hand in the physicalspace is virtually aligned with a tab in the application space. Thegesture, including a pause, a grabbing motion, and then a sweep of thehand to the left, may be interpreted as the selection of a tab, and thenmoving it out of the way to open the next tab.

FIGS. 2A-2C each illustrate an example of a system 200 that can capturea user in a physical space 201 and map captured data to a visualrepresentation in a virtual environment. The system 200 may comprise acomputing environment, capture device, display, or any combinationthereof. A computing environment may be a multimedia console, a personalcomputer (PC), a gaming system or console, a handheld computing device,a PDA, a mobile phone, a cloud computer, or the like, and may include orotherwise connect to a capture device or display. The capture device,computing device, and display device may comprise any suitable devicethat performs the desired functionality, such as the devices describedwith respect to FIG. 1 above or FIGS. 3-8 described below.

The system 200 shown in FIG. 2A comprises an example capture device 208,a computing 210, and a display device 212. The user may interact withthe system 200 at varying distances. The system 200 may employtechniques for scaling and translating gestures to accommodate thedifferent distances from which a user interacts with the system 200.Thus, based on the distance of the user from the computing environmentor capture device, for example, different scales of gesture inputs maybe input for a given desired outcome of the gesture input.

As will be described in more detail below, the physical space may bedivided into virtual zones defined at varying distances from the capturedevice 208. It is contemplated that the physical space may comprise asingle zone or be divided into any number of zones. It is noted that, asused herein, a zone comprises any region, area, or section in thephysical space that is characterized by a particular feature or quality.For example, the zone may be defined as a two-dimensional region, athree-dimensional region, a spherical or cubical region, a split downthe middle of a physical space, or the like. The zones may be acombination of types of zones such that the physical space may bedivided into zones of different shapes and sizes.

This system may continuously monitor the depth/position of the user (h)and alter the scale of gesture required for a given result based onvariation of (h). As the user moves between zones, the system mayreceive captured data representative of the user's gestures. The systemmay identify the user's depth/position (h) and the applicable gesturesthat correspond to the identified position, for example, based on thevirtual zones. In real time, the system may output a command associatedwith the gesture. Thus, the system is capable of adapting gesturerecognition to correspond to the user's distance from the system.

As will be described in more detail below, multiple computingenvironments may exist in the physical space, and therefore differentmodes may be applicable. The system may switch between modes, or zonesets, as appropriate. For example, a system may have three modes: atelevision mode, a laptop mode, and a phone mode. The system mayidentify the appropriate mode and implement a gesture package for thatparticular mode that comprises the gestures applicable in the zones inthe physical space. For example, if the computing environment detectsthat the output is to a television screen, the system may implement thezones in the television mode and analyze the user's gestures withrespect to those zones. If the output is changed to a laptop, the systemmay switch to the zones in the laptop mode. The mode implemented and theway the zones are defined in a mode may be influenced by severalfactors, as described in more detail below.

In the example shown in FIG. 2A, the computing environment 210 may be apersonal computer, where the user may be at varying distances from thepersonal computer 212 and/or a display device 212 connected to thepersonal computer 212. In FIG. 2A, the physical space 201 is dividedinto four example zones 204 a, 204 b, 204 c, 204 d of gesture detectionbased on a distance from the capture device 208. The system may detectthat the user is, for example, very close (touch/tablet or surfacecomputer), relatively close (arms length from the screen), or atdistances further away. For exemplary purposes, the zones may be definedas follows:

Zone 1: (h=0) Touch. User is touching the screen;

Zone 2(h<=15 cm) Close. Cell phone case, portion of user making relevantgesture is very close to the sensor device;

Zone 3: (h=15 to 1 m) Near. PC case; User is sitting in front of theirPersonal Computing device (desktop or laptop);

Zone 4: (h>1 m). Far. Living room case; User is relatively far from theinput sensors.

We could also characterize these inputs according to the type ofgestures that are applicable in each zone. For example, the zones couldbe characterized as follows:

Zone 1: direct (touch) input to sensors;

Zone 2: finger scale gestures;

Zone 3: arm/partial body based gestures;

Zone 4: full body gestures.

Within each zone, a particular set of gestures may apply. For example, aset of gestures may apply for a user's interaction with the system 200at a very close distance, and a different set of gestures may apply fora user's interaction at a further distance. The gestures may vary atvarying distances, or in the various zones, to correspond to an inputthat would be natural for a user.

Consider if the gesture-based system 200 shown in FIG. 2A is a mediaroom entertainment system. The user may be seated across a room from thecomputing environment. For example, a user playing a game may begesturing in an open space in the middle of the room, whereas thecomputing environment and a display may be on an entertainment unitagainst the wall. Similarly, if a user is watching a movie on a screenor television, the user may be seated in a chair across the room fromthe computing environment and/or display. Consequently, larger gestures,such as full arm or whole body motions, may feel more natural to theuser at such distances. Thus, for a media room entertainment typecomputing environment, it may be more often that gestures in zones 3 or4 are applicable. However, if the user is in close range to thecomputing environment, a different set of gestures applicable to closerange interaction may apply. For example, when the user is in closerange to the display 210, small scale gestures or touch screen gesturesmay be more natural for a user. The gestures in zone 1 or 2 maytherefore comprise touch or small finger movements or facialexpressions.

Across the zones, different gestures or gestures defined in differentways may be applicable for the same controls of the system 200. Forexample, a gesture in zone 1, that may be small scale such as a gesturecomprising a finger motion, may issue the same command to control anaspect of the system 200 as a gesture in zone 4, that may be large scalesuch as gesture comprising motion of the user's full arm. Similarly,intermediate zones, zone 2 and zone 3, may comprise a gesture derivedfrom a body position or motion that is intermediate in comparison to thegesture in zones 1 and 4, but controls the same aspect of the system200. Thus, at varying distances, or zones, from the device, varyinggesture may be interpreted to issue the same command or control of thesystem 200. Consider a specific example where the desired command to thesystem is to advance the page of a document by one page, where thedocument is displayed on display 210. The applicable gesture may vary ineach zone, and the gestures may be defined as follows: zone 1) touch thescreen and drag a finger/stylus up the page); zone 2) flick the rightindex finger upwards by ˜10 cm; zone 3) raise the right hand 30 cm; andzone 4) raise right arm 60 cm. Each of these varying gestures,applicable in the various zones, may be valid for issuing the samegeneral command (next page/page down) to the system.

The system may identify the user's depth/position (h) in the physicalspace to determine in which zone the user is located. It is contemplatedthat any suitable manner for dividing the physical space, such as intozones, may be employed, and any suitable manner for tracking the user'sposition in the physical space to determine the applicable zone may beused. For example, the position may be identified with respect totwo-dimensions (x, y) or three-dimensions (x, y, z). The user's position(h) may be a distance between the coordinates of the user and thecoordinates of the system, such as the Euclidean distance or a distanceotherwise derived from the Pythagorean theorem, for example.

In an example embodiment, the system may compare the user'sdepth/position (h) from the system 200 to the distances defined for eachzone to determine the applicable zone, and therefore the applicablegestures. In another example embodiment, the boundaries of each zone maybe defined with respect to two-dimensions or three-dimensions on acommon coordinate system with the user. By comparing the user'spositions with the coordinates of the zones, the zone in which the useris located may be identified.

The set of gestures that apply within a zone may overlap a set ofgestures that apply within another zone or a plurality of other zones.Thus, the same gesture may be applicable for the same control inmultiple zones. For example, a ‘cancel’ command in zone 4 may involvethe user crossing their arms in front of the body, but in zones 2 and 3the gesture for a ‘cancel’ command may comprise the user extending theirright arm, palm outwards. There may be a set of gestures that applyacross all zones, for example. Similarly, while the physical space maybe divided into one zone or any number of zones, the zones themselvesmay be distinct, overlapping, or a combination thereof.

The system may perform efficient gesture recognition such that, near theboundaries of a zone, the system can evaluate the user's gestures withinthe context of multiple zones. For example, consider the ‘cancel’command gesture example described above. The gesture for ‘cancel’ inzone 4 comprises the user crossing his or her arms in front of the bodyand the gesture in zone 3 comprises an extended palm motion. In thisexample, zone 3 extends from a distance of h=15 to 1 m and zone 4includes the distance beyond 1 m. A user around the edge of the zone3/zone 4 boundary should be able to use either the crossed arms or theextended palm gesture.

The system can intelligently determine the appropriate control based ona probability approach, for example. For example, the user may be inzone 4 near the zone 3/zone 4 boundary and perform the extended palmmotion for the ‘cancel’ gesture, where the extended palm motion isactually applicable in zone 3. The system may identify that the gesturedoes not apply to zone 4, and may evaluate the gestures applicable innearby zones to determine if the motion corresponds to one of thosegesture. Alternately, the system may identify that the gesture appliesto both zone 3 and 4, corresponding to different commands, and determinewhich gesture is more probably the intended gesture. The system mayanalyze many factors to assist in determining a probability that theuser intended to perform a particular gesture. For example, the systemmay analyze the circumstances of the application, status of the system,characteristics of the user, active/inactive status of the user, etc.

The system can tolerate the user moving from one zone to another butstill recognize gestures from the previous zone. For example, if theuser moves from zone 4 to zone 3, the system may continue to recognizegestures from zone 4. Consider a user executing a gaming application,for example, that involves running or jumping gestures in zone 4. It isnatural that some of a user's motion in the physical space may cause theuser to move from one zone to another, even if that is not the user'sintention. Different users may have a higher propensity tounintentionally overlap boundaries of a zone while performing a gesture.For example, a tall user with a long arm span may extend arm motion intoother zones more often then a shorter user with smaller arm span. Thesystem can predict the anticipated scale of gestures for a givensituation based on the distance of the tracked user and utilize thatinformation to improve system response time and accuracy. The system canpredict the anticipated scale by identifying other characteristics, suchas features of the user, expected controls in the system or executingapplication, the type of gesture previously performed, the skill levelof the user, or the like.

Further, the user can move between zones and the system can seamlesslydetect the user's change in position and recognize the correspondingchange in a user's gestures. For example, a user may walk up to a touchscreen and use a direct input gesture. Then, the user may walk back to1.5 m away from the screen and use a full body gesture for the samecontrol as the direct input gesture. The system can seamlessly detectthe changed zone and corresponding gestures that cause the same controlin the system.

The system may provide an indication to the user regarding the zone towhich the user's position corresponds. This type of feedback to a usermay explain why a particular gesture is not registering with the system.The user may be able to quickly identify that the user is not in theproper zone for the particular gesture. For example, an indicator couldbe a light bar displayed on display 210, or a physical light bar in thephysical space, with an indication of each zone along the bar. Theindicator could correlate the user's position to each zone, lighting upalong the bar to correspond to the zone in which the user is located. Ifthe bar is lit up between the zones, indicating that the user is betweenzones or close to a boundary between two zones, the user can repositionhis or her position in the desired zone. In another example embodiment,the indication is a visual indication provided on the display screen,for example. In another example, the indication is an audio cue to theuser, such as a voice over that informs the user that he or she hasstepped out of a zone.

The indicator may help the user for proper positioning in the physicalspace. The system may recognize that the user's gestures do notcorrespond to the zone in which the user is positioned. For example, thesystem may recognize that, while the user is in zone 4, the user'sgestures are applicable to those that correspond to zone 3. Theindication may guide the user to the proper position in the physicalspace, or virtual zone, that corresponds to the user's gestures. Thesystem may identify a preferred zone, such as a zone preferred forinteraction with a large display or a zone preferred for interactionwith a laptop. The system may communicate the preferred zone to the uservia the indication, such as a visual or audio cue. Alternately, thesystem can adapt the zones and/or the gestures applicable in a zone toreflect the tendencies of the user.

As described above, the boundaries of each zone may be defined tocorrespond to the desire that gestures are natural user positions ormotions with respect to the user's distance from the system. It iscontemplated that the boundaries of the zones may be based on any one orany combination of features of the system, such as the capture deviceresolution, relative position of the capture device and a computingenvironment or display, the type of computing environment, the type ofapplication executing, user preferences, RGB capabilities, screen size,available input devices, the components of the computing environment(e.g. audio/video capabilities), physical space characteristics, or thedisplay capabilities, for example. Further, the gestures in each zonemay be defined based on a feature or combination of features of thesystem.

FIG. 2A depicts an example set of zones comprising zones 1, 2, 3, and 4.FIG. 2B illustrates an example computing environment 211 shown as ahandheld cellular device. However, while the concepts in the examplesdescribed herein with respect to a specific computing environment, suchas the handheld cellular device shown in FIG. 2B, the concepts are notlimited to a cellular device. As described above, the computingenvironment may comprise any suitable device, such as a multimediaconsole, a personal computer (PC), a cellular device, a gaming system orconsole, a handheld computing device, a PDA, a music player, a cloudcomputer, a capture device, or the like. Three example zones, 206 a, 206b, 206 c are depicted with respect to the cellular device 211, but asnoted it is contemplated that any number of zones may be applicable.FIG. 2C illustrates multiple users, multiple computing environments,multiple capture devices and multiple displays that may be present in aphysical space. As described below, multiple sets of virtual zones, suchas zones 207 a, 207 b, 207 c, 207 d and zones 209 a and 209 b, may bedefined in the physical space. The examples described with respect toFIG. 2A and the examples described below with respect to FIGS. 2B and 2Cdemonstrate ways in which several of these factors may influence theboundaries of a zone (e.g., how the zones divide the physical space, thevarying sizes/locations of each zone, etc.) and/or the gestures that areapplicable for a zone.

Consider a computing environment that is a cell phone or small mobiledevice 211, such as in FIG. 2B. For a computing environment 211 that isa cell phone or other mobile device, where a user is often interactingat close range to the device, zone 1 gestures may be applicable. It maybe natural for a user to use small hand/finger positions or motions whenin close range to a handheld cellular device 211. When a user is inclose range to the cellular device, for example, the user may find itintuitive to use small motions and view changes on a small displaycomponent integrated into the cellular device. Thus, gestures applicableto zone 1, that are in close range distance to the computingenvironment, may be applicable.

Numerous characteristics may influence the manner in which the zonesvirtually divide the physical space and/or the gestures that areapplicable to each virtual zone. Consider if the user is conducting aphone call using the cellular device 211. A capture device 213 and smalldisplay 214 may be integrated into the cellular device 211. For mobilityand the convenience of not having to hold the cellular device, the user202 may place the cellular device in a fixed position in the physicalspace and wear an audio piece for communicating via the cellular devicewhile moving about the room. The capture device 213 may capture theuser's gestures within view of the capture device. Rather than output adisplay to the small screen 214 integrated into the cell phone, the cellphone may be connected to a larger display device 212 and output theresults to the larger display 210. Thus, at further distances from thecellular device 213 it may be more natural for a user viewing the outputto the large display 210 to use larger gestures that comprise an arm orleg, for example. Also, FIG. 2B illustrates how a different set of zonesfor a particular computing environment, such as cellular device 211, maydivide the physical space differently than the zones associated withanother computing environment, such as computing environment 210 shownin FIG. 2A.

A resolution or the distance-related capabilities of a capture devicemay influence the boundaries of the zones and/or the applicablegestures. For example, it may be natural for the gestures applicable tocontrol cellular device 211, with an integrated capture device 212, tocomprise small motions regardless of the distance from the computingenvironment 211. As described above, there may be instances in which theuser moves further from the cell phone or other mobile device, such asinto zones 2, 3, or 4. However, the same small hand/finger positions ormotions applicable in zone 1 may apply at any distance or in any zone,as it is contemplated that the capture device can have a resolution thatcaptures even the slightest motion regardless of the distance.

Thus, the boundaries, or relative distance, of each zone, and thereforethe gestures that apply based on distance, may be influenced by theresolution of the capture device. For example, a lower resolution cameramay have poor or lower quality at further distances. The camera mayrecognize finer gestures closer to the camera but require larger scalegestures for recognition at further distances from the capture device. Ahigher resolution capture device, on the other hand, may be able torecognize small scale gestures at much greater distances, providingflexibility in the scale of gestures that can be recognized throughoutthe zones.

In another example, the boundaries of each zone may be based on featuresof a display component. For example, it may be desirable that thegestures are natural user positions or motions with respect to the sizeof the display component. For a small display, such as display 214 oncellular device 211, it may be natural to the user to make small motionsthat correspond to the small display. For example, if the gesturecorresponds to scrolling through a list displayed on the small screen,the gesture may comprise the user making small up and down motions withthe user's finger such that the motion correspond to the size of thelist displayed on the small screen. Thus, the gestures for interactingwith a small display, such as the gestures in zone 5, 6, and 7,associated with cellular device 211 in FIG. 2C, may be more generallydefined by small motions. However, the gestures in zones 1, 2, 3, 4 forinteracting with a large display, such as display 210, may be moregenerally defined by large motions. For example, whether the user is inclose range or far range to a large projection screen, it may be morenatural for the user's motion for scrolling through the list tocorrespond to the size of the list displayed on the large screen. If thedistance in the physical space is a one-to-one translation, the screensize will have a direct impact on the scale of the gesture. For example,consider a gesture that corresponds to movement covering the distancefrom a top of a display portion of the display device to a bottom of thedisplay portion of the display device. Thus, the set of gestures in aparticular zone or across zones may vary according to the size of adisplay component.

The type of display component may influence the gestures applicable in azone. For example, consider a tablet computing environment or a displaycomponent having touch screen capabilities such that the computingenvironment can detect the presence and location of a touch within thedisplay area. Thus, when the user is in close enough range to thecomputing environment, it may be desirable that some gestures forcontrol of the system 200 comprise direct contact with the system 200.At further distances from the display, such as in zone 2, 3, or 4, thegestures may be defined by non-contact motions or gestures based onother factors, such as what would be natural to the user at thatdistance or the resolution of the camera, for example.

The application executing on the system and/or the available systemcomponents may have a determinative effect on the zones and theboundaries of each zone. For example, a user may be interacting with acellular device 211 to play a game. The user may need to be in closerange to the computing environment to view the game on the screen, suchas the small screen integrated into cellular device 211. Thus, it may bemore natural for a user to use small hand/finger positions or motionswhen in close range, and the gestures in that zone may be definedaccordingly. In another example, however, the user may be executing apresentation application. Perhaps the presentation is being displayed ona larger screen for an audience, but the capture device capable ofcapturing data representative of the user's gestures is integrated intothe user's cellular device. The user may position the cellular device inthe room such that the user is in the capture devices field of view, butthe gestures may be large, body gestures, to control aspects of thepresentation displayed on the large screen. It may be more natural forthe user to be standing or moving in the physical space and use largergestures because of the application executing and the output to a largerdisplay, even if the user is within a close range distance from the cellphone.

The system may be tuned such that user preferences can influence theboundaries of the zones or the gestures that apply in each of the zones.For example, the user may deactivate certain zones, the user maydeactivate all zones, or the user may direct the system to use astandard set of gestures regardless of the user's position in thephysical space. Consider an application in which it would be morenatural to use larger motions at a further distance from the inputsensor and the zones are defined as such. However, consider a user thatis seated in a zone, where the gestures in that zone comprise large bodymotions. However, the user's motion may be restricted, such as if theuser is holding a small child or has a broken leg, for example. The usermay direct the system to apply smaller zone 1 gestures to the entirephysical space such that the user can perform the smaller gestures inthe seated position but still interact with the system. The userpreferences may therefore alter the applicable gestures and/or the zonesfor a specific user or other users of the system. The altered gesturesmay apply temporarily, such as for a current execution of anapplication, or the changes may apply for a longer period of time, suchas until a user changes his or her preferences.

As noted, any suitable computing environment, including the capturedevice itself, may process the capture data. For example, the computingenvironment 210, 211 in FIG. 2A or the capture device 208 may have theprocessing capabilities to process the capture data. The computingenvironment may employ gesture recognition techniques on the capturedata to determine if the user has performed any gestures in the physicalspace. If the physical space is divided into virtual zones, as shown inFIGS. 2A and 2B, the computing environment may first determine whichzone is applicable to determine which gestures are applicable. Thus, anycapable component of the system can determine the user's position andidentify the corresponding virtual zone in the physical space based onany suitable method.

The positioning of the components in the system 200 may influence themanner in which the virtual zones are defined and the manner in whichthe user's position is determined. For example, the zones may be definedin relation to the capture device that captures the depth of the userfrom the perspective of the capture device. However, the zones may betranslated by the capture device to be defined from the perspective ofanother component in the system 200. By comparing the relative positionof the user position with a component of system 200, the system caninterpret and translate the scale of input gesture appropriately.

In an example embodiment, the zones are defined in relation to thecapture device. It is contemplated that the capture device may be fixedin one position or it may be capable of moving to change the field ofview. For example, the capture device 208 in FIG. 2A may have theability to swivel left and right to track a user in the physical spacesuch that the user remains in the camera's field of view. As the capturedevice moves to change the field of view, the zones may changeaccordingly to correspond to the perspective of the capture device. Thecomputing system may identify, from the captured data, the distance ofthe user from the location of the capture device to determine the user'szone. Further, if the capture device is moved to a different location,the zones may be redefined with respect to the capture device location.In an example embodiment, the system may identify the preferred positionof the capture device for gesture recognition and scaling and the usercan position the capture device accordingly.

In the example shown in FIG. 2C, a capture device 208 is positioned onthe wall. Thus, the depth captured from the perspective of the capturedevice 208 may not be from the same perspective as the distance fromcomputing environment 210 or display 210, for example. However, thezones may be defined based on the positioning of a component other thanthe capture device, such as the computing environment or a displaycomponent. The capture data may indicate the user's depth or position inthe physical space with respect to other system components. The systemmay analyze the captured data to identify the bounds of the physicalspace, such as identifying objects, walls, the floor, etc. The systemcan use the information about the physical space and translate it toreflect a different position in the physical space, such as from thedisplay, a capture device positioned on the wall, a computingenvironment, or the like. For example, if the zones are defined based onthe position of the display device, for example, the distance of theuser from the display device may be evaluated to determine the user'szone. Consider another example in which the capture device that capturesthe data is integrated into the computing environment, such as thecellular device 211 shown in FIGS. 2B and 2C. The zones may divide thephysical space in the same or similar manner with respect to both thecapture device and the computing environment, and thus minimal or notranslation may be necessary for properly determining the user's zone.

In another example embodiment, the computing environment may extrapolatethe user's position from the computing environment or the display basedon the depth data captured from the perspective of the capture device.For example, the computing environment may include compensation for theposition of the capture device in relation to the computing environmentor the display. In FIG. 2C, the capture device 208 that captures thedepth of the user may be positioned at some distance from the computingenvironment 211 and display 210. However, zone 1 may include theclose-range region with respect to the display 210, including atouch/contact region. The computing environment may determine the user'sdepth from the perspective of the capture device. The system may use thedifference in position between the capture device and the display 210 todetermine the user's distance from the display 210. In another example,the system may translate the user's distance to correspond to a positionand perspective of the display 210 such that the applicable zone isdefined from the perspective of the display 210. By employing suchtechniques, or similar techniques, the system may compensate for thedifferences in position such that it appears to the user that he or sheis interacting directly with the display.

The system may resolve the user's position in the physical space withrespect to any point in the physical space. For example, each device inthe system can be calibrated to understand the physical space in termsof a three dimensional coordinate system having x, y, and z axes with acommon point of origin. From the captured data, the user's position maybe defined on the common, virtual coordinate system representing thephysical space and the zones may also be defined with respect to thecommon coordinate system. The user's x, y, z position defined on thecommon coordinate system can therefore be compared to the boundaries ofeach zone. The use of a common coordinate system to characterize thephysical space may make the position of device capturing the depth ofthe user inconsequential to the determination of the user's position inthe physical space because the system may calibrate each componentaccordingly.

Any suitable technique is contemplated for defining the boundaries inthe physical space and for determining the user's position with respectto those boundaries. Location-based techniques may be employed, fromsimple distance and positioning based on a common coordinate system totechniques involving global positioning (e.g., GPS). For example,location information pertaining to the user may be received from avariety of types of position determining equipment having differentunderlying technologies, such as: GPS (Global Positioning System); angleof arrival (AOA); time difference of arrival (TDOA); Line of Sight(LOS); etc.

The system can understand input gesture data coming from disparatesensors. For example if the user is using a system capable ofunderstanding touch and three dimensional inputs (such as a surfacecomputing device fitted with a depth camera), a single gesture maycomprise motion in more than one zone. For example, the gesture maycomprise a left to right sweep of the left arm starting ten cm out fromthe sensor, then comprise a touch of the screen during the apex of thecurve, and finish with the arm to the right, again 10 cm out from thescreen. This gesture input may be interpreted as a continuous left toright movement by the system even if the gesture comprises motion inboth zones 1 and 2, for example. Similarly a gesture may involve datacaptured from a capture device that may or may not need to use distancescalable information to produce the optimal experience. For example, agesture may comprise a combination of a voice command and a motion. Thecapture device may comprise an audio sensor may identify a voice commandthat can control an aspect of the system.

As shown in FIG. 2C, a user 202, 230 may be in a physical space dividedinto various sets of zones, such as when a user is in view of multiplecapture devices or if there are multiple computing environmentsreceiving data from at least one capture device. More than one capturedevice may capture a user. For example, the system may switch betweencapture devices or merge captured data from more than one capturedevice. In FIG. 2C, for example, the user may be in a field of view ofboth capture devices 208 and 213. The system may merge data from thecapture devices to increase the data analyzed for gesture recognition.

In an example, the user 202 or 230 shown in FIG. 2C may move out ofrange of the field view of a first capture device 208 but into a fieldof view of a second capture device. For example, if a user 202 is movingbetween rooms in a house or office, the user 202 may move in and out ofview of capture devices located in different rooms. In another example,the user may move out of range of capture device 208 but still becaptured by capture device 211 on the user's cellular device 213. Thesystem 200 may analyze the captured data from either capture device 208,213 such that gesture recognition can continue seamlessly to the userdespite the switch between capture devices.

The system may identify gestures for multiple users in a physical space,where the users may be in different zones with respect to the samecomponent in the system 200. A capture device in the physical space maybe focused on a respective user, or a capture data may capture data withrespect to several users. The system may sort the data for eachrespective user for purposes of gesture recognition. For example, thesystem may correlate a portion of the capture data to a user based onthe user's position in the physical space or an identity of the userrecognized by the system.

The system may also intelligently transition between applications orinterfaces based on a user's position in the physical space. Forexample, if user 230 moves from zone 2 into zone 4, the system maydetect the movement between zones and change application that isexecuting and/or switch between executing applications. Thus, the user'sdistance from a component in the system may control the application thatis selected for execution or dictate the displayed interface/applicationon the display device. For example, user 230 could be typing on akeyboard close to the computing environment 212, is in zone 2, and thecomputing environment 212 is executing a media player and displaying auser interface for the media player on the display device. The user 230may walk away from the keyboard and move into zone 4. The system mayrecognize the movement and change the display from the media player userinterface to a media center interface or a different application thatcorresponds to the user's modified position. Thus, if an activity hascomplementary application interfaces that are associated with distanceor position in the physical space, the system can automatically migratefrom one experience to another based on the user's active distance froma component in the system, such as the capture device or display.

Similarly, the system may modify aspects of the executing applicationand/or corresponding display based on the user's distance. For example,if a user is interacting with a word processing document and is close tothe computing environment, such as in zone 1 or 2, the system maydisplay words on the screen in a small font. However, if the user backsup and moves to zone 3 or 4, for example, the font size may increasewith the user's increased distance from the display. The user may selectfor this to happen, or it may be automatic.

The system 200 may identify the computing environments connecteddirectly or otherwise connected via the network, such as computingenvironments 210, 211 and any remote computing environments that shareinformation over a network with the local computing environments 210,211. The captured data may be provided to multiple computingenvironments and the computing environment(s) can process the datasuitably for that environment. For example, the capture device 208 mayprovide data representative of either or both users 202, 230 to cellularphone 211, and cellular phone 211 can use the data to identify theuser's position with respect to the zones applicable to the cellularphone. The zones may be specific to the cellular phone, such as thezones 5, 6, and 7 shown. Similarly, the capture device 208 may providethe same data representative of a user 202, 230 to computing environment210, and computing environment 210 can use the data to identify theuser's position with respect to the zones 1, 2, 3, and 4 applicable tothe computing environment 210.

In FIG. 2C, the example set of zones, zones 1, 2, 3, and 4, may bedefined from the perspective of display 210 and apply to both computingenvironments 210, 211. For example, consider the user 202 captured bycapture device 208 where the user is positioned in close range to thecellular device 211 but across the room from the computing environment210.

There may be a single set of zones that divide the physical space, wherethe single set of zones are applicable to more than one computingenvironment. For example, the set of zones 1, 2, 3, and 4 may apply toboth computing environments 210, 211 such that the gestures in zone 4apply to both computing environments 210, 211. Consider, for example,that display 210 is displaying a virtual calculator and the user 202gestures with respect to display 210 to select numbers on the displayedcalculator. The same gestures may be applicable to the cellular phone211. A gesture in zone 4 may comprise movements of a pointed finger,where a pointer displayed on the screen corresponds to the motion suchthat the user can point to numbers on the calculator via the movingpointer. A gesture for the selection of a number may comprise aclutching motion once the pointer is on the desired number. The systemmay recognize that the user is in zone 4 and recognize the gesture asapplicable to zone 4 for the particular control of number selection. Theuser may view numbers on a small display screen of cellular device 211.The same scale gesture comprising moving a finger around in the physicalspace may control a pointer on the display screen of the phone 211.Thus, the user may move the pointer on the screen of device 211 andselect a number using the same gestures.

The sets of zones may divide the physical space differently fordifferent computing environments. Thus, multiple sets of similarlydefined zones may be positioned differently in the physical space tocorrespond to a particular computing device, but the number of zones andgestures applicable in each zone may be similar between multiplecomputing environments. For example, zones 1 and 5 (the zones that arein close range to each respective computing environment, 211 and 210)are the same, including the same set of gestures at close range.

Alternately, the gestures in each zone may vary between different setsof zones depending on various factors, such as the type of computingenvironment, camera resolution, the size of the display, the type ofapplications that execute on the computing environment, the typical useof the particular computing environment, or the like. For example, thezones applicable to cellular phone 211 may be a different set than thoseapplicable to computing environment 210. Thus, the gestures definedacross zones 5, 6, and 7, applicable for the cellular phone, may bedifferent from the gestures defined across zones 1, 2, 3, and 4,applicable to the computing environment 211. The gestures applicable tothe cell phone, for example, may be mostly small finger gestures atvarying distances throughout each zone, zone 5, 6, 7, and the gesturesapplicable to the computing environment 211 may be mostly large bodygestures at varying distances in the zones 1, 2, 3, and 4.

The gestures in the different sets of zones may be sufficientlydistinct, thus lessening the risk that computing environment 210 willinterpret a user's gesture that is intended for the cellular phone 211,and vice versa. For example, the zones may be arranged such that theuser is in zone 5 with respect to the cellular phone 211 and in zone 3with respect to the computing environment 211. The user's gestures inzone 5, at close range to cellular phone 211, may be small fingergestures or touch screen gestures. If the gestures in zone 3 applicableto computing environment 210 are large gestures that involve the armsand legs, for example, the user's gestures may be recognized thecellular phone 211 and not recognized by the computing environment 210.

In another example embodiment, a set of zones or the computingenvironment itself may be set as active or inactive. An inactivecomputing environment may not analyze the captured data for gesturerecognition or may not even receive captured data. For example, thecapture device(s) may identify an active computing environment andprovide the captured data to the active computing environment. If thecomputing environment receives the capture data, the zones may beinactive such that the capture data is not analyzed or is analyzed withrespect to a default set of gesture data.

Multiple computing environments may be active. Thus, a user's gesturemay control aspects of multiple computing environments. In an exampleembodiment, gestures may inherently control the appropriate computingenvironment based on the distinct sets of gestures between computingenvironments. In another example embodiment, a gesture may comprise anindication of the computing environment to which the gesture applies.For example, the system could identify each computing environment in thephysical space, or otherwise connected to the system, by number. Priorto a control gesture, the user could perform an indicator gesture, suchas a waving the number of fingers that corresponds to the number of thecomputing environment of interest. The gesture following the number maybe applied to the selected computing environment. The selected computingenvironment could remain as the primary computing environment until theuser changes it or selects a different computing environment as theprimary. Thus, the user may not have to perform the indicator gestureeach time, but just to switch between active computing environments.

Multiple capture devices may be present in a physical space. Thecaptured data from multiple capture devices may be merged. For example,a first capture device may capture data representative of the user, anda second capture device may capture data representative of the sameuser. The data from the first and second capture devices may be sharedwith other components in the system 200, such as the computingenvironment. The computing environment, for example, may analyze thecaptured data from both capture devices and merge the data. The captureddata may comprise a series of images with timestamps. Each capturedevice may capture data at varying timestamps, so combining the data mayincrease the number of images that represent the user.

In another example embodiment, at least one capture device may beselected as the primary capture device for capturing data representativeof the user. As shown in FIG. 2C, an example first capture device 208 ispositioned on the wall and an example second capture device isintegrated into the cellular phone 211. The capture devices maycommunicate with each other, such as via Bluetooth communication orother network connection. The captured data from each capture device maybe shared and provided to a computing environment, such as 210 or 211.In an example embodiment, the captured data is merged and a computingenvironment can analyze the merged data for gesture recognition. Inanother example embodiment, a capture device may be selected forcapturing and providing the data representative of the user's gestures.The capture device selected out of multiple capture devices may beselected based on a variety of reasons including but not limited to itsresolution, battery life, view of the user, networking capabilities,etc. For example, despite having an integrated capture device, computingenvironment 211 may select to receive data from capture device 208 if ithas better resolution.

The user may set the active or inactive status of a capture device,computing environment, or zones. The user may set the active or inactivestatus via a gesture. For example, a gesture for causing an inactivestatus for a computing environment may comprise an open hand facing thecomputing environment of interest, and slowly motioning downward withstraight fingers. The reverse may activate the zones.

Consider, for example, a user's cellular phone 211 that may be on atable next to the user while the user is playing a game executing oncomputing environment 210. While not intended by the user, a user'sgesture with respect to computing environment 210 could be analyzed byand/or applicable to the cell phone 211. Thus, the user mayunintentionally cause the cellular phone 211 to perform some functionbased on the user's gestures intended for computing environment 210. Theuser may wish to deactivate or set the zones to inactive for cellulardevice 211 to avoid unintentional controls. The system may prompt theuser to identify the computing environments to receive and/or processcaptured data. For example, computing environment 210 may receive anindication that a second computing environment 211 is nearby. Thecomputing environment may output a list of identified computingenvironments on the display 210 and ask the user to identify thepertinent computing environments for a session.

Multiple users may be present in the physical space. A capture devicemay track multiple users or focus on a particular user. A capture devicemay be dedicated to capturing data for a single user or may trackmultiple users. The cellular device 211 may identify the user via bodyor voice recognition techniques, for example, and associate the captureddata to the particular user.

It is contemplated that a single device may perform all of the functionsin system 200, or any combination of suitable devices may perform thedesired functions. For example, the computing device 210 may provide thefunctionality described with respect to the computing environment 212shown in FIGS. 2A-2C or the computer described below in FIG. 7. Thecomputing device 210 may also comprise its own camera component or maybe coupled to a device having a camera component, such as capture device208.

It is contemplated that any number of computing environments and anynumber of capture devices may be connected. For example, various devicesor components in a gesture-based system may handshake with each othervia a network and communicate to share information. For example, thecapture device and each computing environment may communicate over awired or wireless connection, such as via a cable connection, a Wi-Ficonnection, or a home wireless network, for example. The various capturedevices may share the captured data with each other or provide them to acomputing environment for processing and interpreting for control of anaspect of they gesture-based system. An example network setup isdescribed in more detail below with respect to FIG. 3.

The components of a networked system can share information locallywithin a location or remotely across locations. In an exampleembodiment, the local computing environment receives the datarepresentative of user from the capture device. The computingenvironment may output to a local display, such as a display componentof the computing environment or another display device otherwiseconnected to the computing environment 302. The computing environmentmay alternately or also provide the data to a remote computingenvironment or a remote display component for display. For example,computing environment may communicate with computing environment over anetwork. Computing environment may receive data from the computingenvironment and map the gestures of user to a display component local tocomputing environment.

Thus, by communicating over a network 250, any number of users mayinteract with a plurality of other users via gestures. For example,gestures performed in a first location can be translated and mapped to adisplay in a plurality of locations, including the first location. Asdescribed above in the examples shown in FIGS. 2A-2C, zones may bedefined in a physical space that indicates the applicable zones atvarying distances. As described above, the zones may be defined based ona number of factors, including a camera resolution, a type of computingenvironment, an executing application, etc. Thus, the zones 351 inLocation #1 may divide the physical space into zones differently thanthe zones 352 in Location #2 or zones 353 in Location #3 based on avariety of factors. For example, each capture device may have a varyingresolution or distance-related capability and each location may havezones of gesture data. A user in each location may gesture appropriatelyfor the zone in which they are in, which may be tailored to the capturedevice for which the user interacts. For example, a user 314 in location#3 may interact with the mobile handheld device 304 at a close range. Auser 322 in location #1 may interact with a computing environment 302,such as a personal computer, at longer distances. A gesture by differentusers at different distances from different computing environmentscomprising different motion or user position may issue the same commandto the system.

FIG. 3A depicts an example flow diagram for a method of identifying auser's gesture from capture data with respect to a plurality of virtualzones. For example, any gesture-based system such as that shown in FIGS.1-3 may perform the operations shown here.

At 302, a system may receive data from a physical space that includes atarget, such as a user or a non-human object. As described above, acapture device can capture data of a scene, such as the depth image ofthe scene and scan targets in the scene. The capture device maydetermine whether one or more targets in the scene correspond to a humantarget such as a user. For example, to determine whether a target orobject in the scene corresponds to a human target, each of the targetsmay be flood filled and compared to a pattern of a human body model.Each target or object that matches the human body model may then bescanned to generate a skeletal model associated therewith. For example,a target identified as a human may be scanned to generate a skeletalmodel associated therewith. The skeletal model may then be provided tothe computing environment for tracking the skeletal model and renderinga visual representation associated with the skeletal model.

Any known technique or technique disclosed herein that provides theability to scan a known/unknown object, scan a human, and scanbackground aspects in a scene (e.g., floors, walls) may be used todetect features of a target in the physical space. The scan data foreach, which may include a combination of depth and RGB data, may be usedto create a three-dimensional model of the object. The RGB data isapplied to the corresponding area of the model. Temporal tracking, fromframe to frame, can increase confidence and adapt the object data inreal-time. Thus, the object properties and tracking of changes in theobject properties over time may be used to reliably track objects thatchange in position and orientation from frame to frame in real time. Thecapture device captures data at interactive rates, increasing thefidelity of the data and allowing the disclosed techniques to processthe raw depth data, digitize the objects in the scene, extract thesurface and texture of the object, and perform any of these techniquesin real-time such that the display can provide a real-time depiction ofthe scene. Further, the capture data may be captured by a plurality ofcapture devices. Thus, a collection of data representative of the user'sgestures, from various sources, may be merged. The collection of datamay comprise images taken at different times and thus combining the datamay provide an increased fidelity in the data representative of theuser.

The system may identify a position of the user in the physical spacefrom the captured data at 304. Alternately, at 306, the position of theuser may be defined relative to a plurality of virtual zones. A virtualzone is a virtual space representative of a portion of the physicalspace. The gesture-based system may compare the captured data to gesturedata associated with one of the plurality of virtual zones at 308.

In another example, the system may correlate a position of the user at308 to a plurality of virtual zones. For example, the user's position inthe physical space may correlate to a plurality of virtual zones at 306.The gesture data associated with the virtual zone may be a set ofgesture data associated with the virtual zone, where each virtual zonehas a respective set of gesture data associated thereto. If the user'sposition in the physical space correlates to a plurality of virtualzones, a preferred detection order may be applied at 310. The preferreddetection order may order the plurality of virtual zones in a certainmanner. The preferred detection order may be based on the user'sdistance from each of the plurality of virtual zones.

The gesture data compared to the captured data, at 312, may beassociated with a virtual zone that correlates to the user's position inthe physical space. For example, as described above, the user's positionin the physical space may be identified as being within a boundary ofone of the virtual zones. For example, in an example, four virtualzones, zone 1, zone 2, zone 3, and zone 4, represent four portions ofthe physical space. The user in this example may be positioned withinthe boundaries of zone 3. The user's position may be identified aswithin the boundaries of zone 3 and, thus, the set of gesture dataassociated with zone 3 may be used for comparison to the captured data.It is also noted that zones may overlap, thus two zones may comprise atleast the same portion of the physical space and a user's may bepositioned in two virtual zones at the same time.

Consider the example above, where four virtual zones, zone 1, zone 2,zone 3, and zone 4, represent four portions of the physical space. Theuser's position may be identified as within the boundaries of zone 3,and thus the user's position may correlate to virtual zone 3. However,the user's position may correlate to other virtual zones. For example, auser's position may correlate to a virtual zone if the position of theuser is within the portion of the physical space represented by thevirtual zone, the position of the user is within a predetermineddistance from the portion of the physical space represented by thevirtual zone, or the position of the user is adjacent to a boundary ofthe virtual zone. The correlation to more than one zone enables thesystem to adjust gesture recognition to a zone that may not directlycorrelate to the user's position. In this manner, if the user is betweenzones or near the boundary between two zones, the gesture recognitiontechniques may be flexible and account for the user's gesture intendedfor a nearby zone.

The preferred detection order may be determined based on a probabilisticapproach. For example, the zone in which the user is positioned may befirst in the preferred detection order, as the zone in which the user ispositioned is most probably representative of the set of gestures thatapply to the user's position. However, the user's gesture may notregister with the gesture data associated with that zone, or it mayregister but the system may output low confidence rating. The system maycompare the captured data to a set of gesture data associated with anext virtual zone based on the preferred detection order. The nextvirtual zone in the order, for example, may be the next closest zone tothe user's position, where the user's position is close to a boundary ofthe virtual zone. Thus, the preferred detection order may be an order ofvirtual zones that corresponds to an increasing distance from the user'sposition. A user's distance from a virtual zone may be defined by theuser's position from a central point of a zone or a boundary of thezone. The virtual zone may correspond to the user's position if thedistance is a predetermined value, such as that set by the system or bythe user.

The user's position may correlate to more than one zone if the user'sposition changes and crosses a boundary between virtual zones within apredetermined time period. For example, the system may detect that theuser moves in the physical space from a first virtual zone into a secondvirtual zone during the middle of a gesture. In an example embodiment,the first virtual zone in the preferred detection order may be theinitial virtual zone that corresponds to the user's position at thebeginning of the gesture.

In another example embodiment, the system may identify both zones andmay compare the user's gesture to sets of gesture data applicable toeach of the virtual zones and identify the gesture from the combinationof gesture data sets. In this manner, the system may identify the user'sgesture as the gesture that corresponds best, e.g., a higher confidencerating that a gesture was performed, as a result of a comparison of thecaptured data to both sets of gesture data. Thus, the preferreddetection order of the plurality of zones may be determined based on theconfidence rating or level of correlation between the captured datarepresentative of the user's gesture and the gesture data in the sets ofgesture associated with each of the plurality of virtual zones.

By comparing the captured data, that is representative of the user'sgesture, to gesture data for a virtual zone, a selected set of gesturedata, or multiple sets of gesture data, the system may identify a user'sgesture at 314. For example, to determine if a gesture was performed,the gesture recognition may use gesture information from a gestureprofile personalized for a user to identify the user's gestures. Theidentification of the user's gesture may be performed in real time withrespect to the rate of date capture of the user's gesture in thephysical space. Thus, the recognition of the gesture, and therefore thecorresponding control of the system, may be in real time and appearseamless to the user. The user's gesture may correspond to a control ofan aspect of the system. Thus, within the varying virtual zones thatrepresent the physical space, varying sets of gestures may apply withineach zone, and thus, the user's position in the physical space maycontrol which gestures will register with a control of the system.

The system may detect a change in the user's position at 316 andtransition to a different set of gesture data, either by way of 318 or320. The transition may be done in real time and it may be seamless tothe user. Thus, as the user moves around in the physical space,different gestures may be recognized by the system for the same control.The varying sets of gesture data enable the system to recognize gesturesof different scales, as a particular scale of gestures may be morenatural for a user depending on the user's position in the physicalspace. For example, when the user is close to the screen, it may be morenatural for the user to touch the screen or use small finger-scalegestures. However, at larger distances, it may be more natural for theuser to make large-scale gestures that comprise more parts of the body,for example.

FIG. 3B depicts an example flow diagram for a method of associatinggesture data to virtual zones in a physical space. At 330, the systemmay capture data representative of a physical space. The system mayidentify virtual zones at 332 and apply them to the physical space suchthat each of the virtual zones is representative of a respective portionof the physical space.

The set of gesture data associated with a virtual zone may be based onthe type of input available within the bounds of the virtual zone. Forexample, gestures defined by touch screen inputs may comprise a set ofgesture data within a virtual zone that is defined in a portion of thephysical space that is in a contact region with a component of thesystem.

As described above, the virtual zones identified at 332 may be definedfrom the perspective of a component in the gesture-based system. In anexample embodiment, the system may capture data representative of thephysical space and partition the physical space into virtual zones suchthat a plurality of virtual zones define the physical space. In anotherexample embodiment, the virtual zones may be predetermined such that thesystem applies the virtual zones regardless of the size of the physicalspace. For example, a capture device may have a resolution of 10 m,where within a 10 m radius from the capture device the capture device isable to capture data from the physical space sufficient for gesturerecognition. The virtual zones may be defined with respect to theresolution of the capture device. For example, there may be 10 zones,each spanning a radial distance of 1 m from the capture device. However,the physical space may be smaller than the space that corresponds to theresolution of the capture device. Thus, only a portion of the possiblevirtual zones may apply.

For each virtual zone that applies to the physical space, a respectiveset of gesture data may be associated at 334. Thus, in the varyingvirtual zones that represent portions of the physical space, varyinggestures may apply. The gestures in a set of gesture data may be scaledproportionate to the distance of the associated virtual zone from acomponent in the gesture-based system, such as from the capture device,for example. The gesture data in different sets of gesture data maycomprise overlapping gesture data. Alternately, each set of gesture datamay be unique to a specific virtual zone. Typically, a gesturecorresponds to a control of the gesture-based system. For example, agesture may correspond to a driving gesture or a gesture to open a file.By separating the physical space into virtual zones, gestures may bedefined differently in different zones but correspond to the samecontrol. For example, in a zone that comprises a contact region with adisplay screen, the gesture for opening a file may comprise a touchscreen gesture. However, in a zone further from the screen, a differentgesture, such as a hand or arm motion, may also correspond to the samecontrol for opening a file.

A user may change preferences for the set of gesture data that isassociated with a virtual zone. For example, if a zone further away fromthe capture device comprises large scale gestures that a user cannotperform, the user may modify the set of gesture data to comprise smallscale gestures, for example.

In an example embodiment a computer readable storage media can storeexecutable instructions for performing the techniques disclosed herein,such as those described in FIGS. 6 and 7. For example, a computerreadable storage media can be a part of computing environment, howeverin other embodiments the computer readable storage media could be a partof capture device. The instructions may provide instructions for gesturerecognition in the context of a plurality of virtual zones, as describedabove. The instructions may also provide for applying virtual zones to aphysical space such that each of the virtual zones represents a portionof the physical space.

FIG. 4 illustrates an example embodiment of the capture device 202 thatmay be used for target recognition, analysis, and tracking, where thetarget can be a user or an object. According to an example embodiment,the capture device 202 may be configured to capture video with depthinformation including a depth image that may include depth values viaany suitable technique including, for example, time-of-flight,structured light, stereo image, or the like. According to oneembodiment, the capture device 202 may organize the calculated depthinformation into “Z layers,” or layers that may be perpendicular to a Zaxis extending from the depth camera along its line of sight.

As shown in FIG. 4, the capture device 202 may include an image cameracomponent 22. According to an example embodiment, the image cameracomponent 22 may be a depth camera that may capture the depth image of ascene. The depth image may include a two-dimensional (2-D) pixel area ofthe captured scene where each pixel in the 2-D pixel area may representa depth value such as a length or distance in, for example, centimeters,millimeters, or the like of an object in the captured scene from thecamera.

As shown in FIG. 4, according to an example embodiment, the image cameracomponent 22 may include an IR light component 24, a three-dimensional(3-D) camera 26, and an RGB camera 28 that may be used to capture thedepth image of a scene. For example, in time-of-flight analysis, the IRlight component 24 of the capture device 202 may emit an infrared lightonto the scene and may then use sensors (not shown) to detect thebackscattered light from the surface of one or more targets and objectsin the scene using, for example, the 3-D camera 26 and/or the RGB camera28. In some embodiments, pulsed infrared light may be used such that thetime between an outgoing light pulse and a corresponding incoming lightpulse may be measured and used to determine a physical distance from thecapture device 202 to a particular location on the targets or objects inthe scene. Additionally, in other example embodiments, the phase of theoutgoing light wave may be compared to the phase of the incoming lightwave to determine a phase shift. The phase shift may then be used todetermine a physical distance from the capture device 202 to aparticular location on the targets or objects.

According to another example embodiment, time-of-flight analysis may beused to indirectly determine a physical distance from the capture device202 to a particular location on the targets or objects by analyzing theintensity of the reflected beam of light over time via varioustechniques including, for example, shuttered light pulse imaging.

In another example embodiment, the capture device 202 may use astructured light to capture depth information. In such an analysis,patterned light (i.e., light displayed as a known pattern such as gridpattern or a stripe pattern) may be projected onto the scene via, forexample, the IR light component 24. Upon striking the surface of one ormore targets or objects in the scene, the pattern may become deformed inresponse. Such a deformation of the pattern may be captured by, forexample, the 3-D camera 26 and/or the RGB camera 28 and may then beanalyzed to determine a physical distance from the capture device 202 toa particular location on the targets or objects.

According to another embodiment, the capture device 202 may include twoor more physically separated cameras that may view a scene fromdifferent angles, to obtain visual stereo data that may be resolved togenerate depth information. In another example embodiment, the capturedevice 202 may use point cloud data and target digitization techniquesto detect features of the user.

The capture device 202 may further include a microphone 30, or an arrayof microphones. The microphone 30 may include a transducer or sensorthat may receive and convert sound into an electrical signal. Accordingto one embodiment, the microphone 30 may be used to reduce feedbackbetween the capture device 202 and the computing environment 212 in thetarget recognition, analysis, and tracking system 10. Additionally, themicrophone 30 may be used to receive audio signals that may also beprovided by the user to control applications such as game applications,non-game applications, or the like that may be executed by the computingenvironment 212.

In an example embodiment, the capture device 202 may further include aprocessor 32 that may be in operative communication with the imagecamera component 22. The processor 32 may include a standardizedprocessor, a specialized processor, a microprocessor, or the like thatmay execute instructions that may include instructions for receiving thedepth image, determining whether a suitable target may be included inthe depth image, converting the suitable target into a skeletalrepresentation or model of the target, or any other suitableinstruction. For example, the computer-readable medium may comprisecomputer executable instructions for receiving data of a scene, whereinthe data includes data representative of the target in a physical space.The instructions comprise instructions for gesture profilepersonalization and gesture profile roaming, as described herein.

The capture device 202 may further include a memory component 34 thatmay store the instructions that may be executed by the processor 32,images or frames of images captured by the 3-d camera 26 or RGB camera28, or any other suitable information, images, or the like. According toan example embodiment, the memory component 34 may include random accessmemory (RAM), read only memory (ROM), cache, Flash memory, a hard disk,or any other suitable storage component. As shown in FIG. 2, in oneembodiment, the memory component 34 may be a separate component incommunication with the image capture component 22 and the processor 32.According to another embodiment, the memory component 34 may beintegrated into the processor 32 and/or the image capture component 22.

As shown in FIG. 4, the capture device 202 may be in communication withthe computing environment 212 via a communication link 36. Thecommunication link 36 may be a wired connection including, for example,a USB connection, a Firewire connection, an Ethernet cable connection,or the like and/or a wireless connection such as a wireless 802.11b, g,a, or n connection. According to one embodiment, the computingenvironment 212 may provide a clock to the capture device 202 that maybe used to determine when to capture, for example, a scene via thecommunication link 36.

Additionally, the capture device 202 may provide the depth informationand images captured by, for example, the 3-D camera 26 and/or the RGBcamera 28, and a skeletal model that may be generated by the capturedevice 202 to the computing environment 212 via the communication link36. The computing environment 212 may then use the skeletal model, depthinformation, and captured images to, for example, control an applicationsuch as a game or word processor. For example, as shown, in FIG. 4, thecomputing environment 212 may include a gestures library 192.

As shown, in FIG. 4, the computing environment 212 may include agestures library 192 and a gestures recognition engine 190. The gesturesrecognition engine 190 may include a collection of gesture filters 191.A filter may comprise code and associated data that can recognizegestures or otherwise process depth, RGB, or skeletal data. Each filter191 may comprise information defining a gesture along with parameters,or metadata, for that gesture. For instance, a throw, which comprisesmotion of one of the hands from behind the rear of the body to past thefront of the body, may be implemented as a gesture filter 191 comprisinginformation representing the movement of one of the hands of the userfrom behind the rear of the body to past the front of the body, as thatmovement would be captured by a depth camera. Parameters may then be setfor that gesture. Where the gesture is a throw, a parameter may be athreshold velocity that the hand has to reach, a distance the hand musttravel (either absolute, or relative to the size of the user as awhole), and a confidence rating by the recognizer engine that thegesture occurred. These parameters for the gesture may vary betweenapplications, between contexts of a single application, or within onecontext of one application over time.

While it is contemplated that the gestures recognition engine 190 mayinclude a collection of gesture filters, where a filter may comprisecode or otherwise represent a component for processing depth, RGB, orskeletal data, the use of a filter is not intended to limit the analysisto a filter. The filter is a representation of an example component orsection of code that analyzes data of a scene received by a system, andcomparing that data to base information that represents a gesture. As aresult of the analysis, the system may produce an output correspondingto whether the input data corresponds to the gesture. The baseinformation representing the gesture may be adjusted to correspond tothe recurring feature in the history of data representative of theuser's capture motion. The base information, for example, may be part ofa gesture filter as described above. But, any suitable manner foranalyzing the input data and gesture data is contemplated.

In an example embodiment, a gesture may be recognized as a trigger forthe entry into a modification mode, where a user can modify gestureparameters in the user's gesture profile. For example, a gesture filter191 may comprise information for recognizing a modification triggergesture. If the modification trigger gesture is recognized, theapplication may go into a modification mode. The modification triggergesture may vary between applications, between systems, between users,or the like. For example, the same gesture in a tennis gamingapplication may not be the same modification trigger gesture in abowling game application.

The data captured by the cameras 26, 28 and device 202 in the form ofthe skeletal model and movements associated with it may be compared tothe gesture filters 191 in the gestures library 192 to identify when auser (as represented by the skeletal model) has performed one or moregestures. Thus, inputs to a filter such as filter 191 may comprisethings such as joint data about a user's joint position, like anglesformed by the bones that meet at the joint, RGB color data from thescene, and the rate of change of an aspect of the user. As mentioned,parameters may be set for the gesture. Outputs from a filter 191 maycomprise things such as the confidence that a given gesture is beingmade, the speed at which a gesture motion is made, and a time at whichthe gesture occurs.

The computing environment 212 may include a processor 195 that canprocess the depth image to determine what targets are in a scene, suchas a user 18 or an object in the room. This can be done, for instance,by grouping together of pixels of the depth image that share a similardistance value. The image may also be parsed to produce a skeletalrepresentation of the user, where features, such as joints and tissuesthat run between joints are identified. There exist skeletal mappingtechniques to capture a person with a depth camera and from thatdetermine various spots on that user's skeleton, joints of the hand,wrists, elbows, knees, nose, ankles, shoulders, and where the pelvismeets the spine. Other techniques include transforming the image into abody model representation of the person and transforming the image intoa mesh model representation of the person.

In an embodiment, the processing is performed on the capture device 202itself, and the raw image data of depth and color (where the capturedevice 202 comprises a 3D camera 26) values are transmitted to thecomputing environment 212 via link 36. In another embodiment, theprocessing is performed by a processor 32 coupled to the camera 402 andthen the parsed image data is sent to the computing environment 212. Instill another embodiment, both the raw image data and the parsed imagedata are sent to the computing environment 212. The computingenvironment 212 may receive the parsed image data but it may stillreceive the raw data for executing the current process or application.For instance, if an image of the scene is transmitted across a computernetwork to another user, the computing environment 212 may transmit theraw data for processing by another computing environment.

The computing environment 212 may use the gestures library 192 comparedto the gesture sets 205 a, 205 b . . . 205 n in each of the virtualzones 205 such as those shown in FIG. 2A to interpret movements of theskeletal model and to control an application based on the movements. Thecomputing environment 212 can model and display a representation of auser, such as in the form of an avatar or a pointer on a display, suchas in a display device 193. Display device 193 may include a computermonitor, a television screen, or any suitable display device. Forexample, a camera-controlled computer system may capture user image dataand display user feedback on a television screen that maps to the user'sgestures. The user feedback may be displayed as an avatar on the screensuch as shown in FIG. 1. The avatar's motion can be controlled directlyby mapping the avatar's movement to those of the user's movements. Theuser's gestures may be interpreted control certain aspects of theapplication.

According to an example embodiment, the target may be a human target inany position such as standing or sitting, a human target with an object,two or more human targets, one or more appendages of one or more humantargets or the like that may be scanned, tracked, modeled and/orevaluated to generate a virtual screen, compare the user to one or morestored profiles and/or to store a gesture profile associated with theuser in a computing environment such as computing environment 212. Thegesture profile may be specific to a user, application, or a system. Thegesture profile may be accessible via an application or be availablesystem-wide, for example. The gesture profile may include lookup tablesfor loading specific user profile information. The virtual screen mayinteract with an application that may be executed by the computingenvironment 212 described above with respect to FIG. 1.

The gesture profile may include user identification data such as, amongother things, the target's scanned or estimated body size, skeletalmodels, body models, voice samples or passwords, the target's gender,the targets age, previous gestures, target limitations and standardusage by the target of the system, such as, for example a tendency tosit, left or right handedness, or a tendency to stand very near thecapture device. This information may be used to determine if there is amatch between a target in a capture scene and one or more users. Ifthere is a match, the gesture profiles for the user may be loaded and,in one embodiment, may allow the system to adapt the gesture recognitiontechniques to the user, or to adapt other elements of the computing orgaming experience according to the gesture profile.

One or more gesture profiles may be stored in computer environment 212and used in a number of user sessions, or one or more profiles may becreated for a single session only. Users may have the option ofestablishing a profile where they may provide information to the systemsuch as a voice or body scan, age, personal preferences, right or lefthandedness, an avatar, a name or the like. Gesture profiles may also begenerated or provided for “guests” who do not provide any information tothe system beyond stepping into the capture space. A temporary personalprofile may be established for one or more guests. At the end of a guestsession, the guest gesture profile may be stored or deleted.

The gestures library 192, gestures recognition engine 190, and gesturedata 205 a-205 n may be implemented in hardware, software or acombination of both. For example, the gestures library 192, and gesturesrecognition engine 190 may be implemented as software that executes on aprocessor, such as processor 195, of the computing environment 212 (oron processing unit 101 of FIG. 6 or processing unit 259 of FIG. 6).

It is emphasized that the block diagrams depicted in FIG. 4 and FIGS. 6and 7 described below are exemplary and not intended to imply a specificimplementation. Thus, the computing environment 212 in FIG. 1, theprocessor 195 of FIG. 4, and the processing unit 259 of FIG. 9, can beimplemented as a single processor or multiple processors. Multipleprocessors can be distributed or centrally located. For example, thegestures library 192 may be implemented as software that executes on theprocessor 32 of the capture device or it may be implemented as softwarethat executes on the processor 195 in the computing environment 212. Anycombinations of processors that are suitable for performing thetechniques disclosed herein are contemplated. Multiple processors cancommunicate wirelessly, via hard wire, or a combination thereof.

The gestures library and filter parameters may be tuned for anapplication or a context of an application by a gesture tool. A contextmay be a cultural context, and it may be an environmental context. Acultural context refers to the culture of a user using a system.Different cultures may use similar gestures to impart markedly differentmeanings. For instance, an American user who wishes to tell another userto “look” or “use his eyes” may put his index finger on his head closeto the distal side of his eye. However, to an Italian user, this gesturemay be interpreted as a reference to the mafia.

Similarly, there may be different contexts among different environmentsof a single application. Take a first-user shooter game that involvesoperating a motor vehicle. While the user is on foot, making a firstwith the fingers towards the ground and extending the first in front andaway from the body may represent a punching gesture. While the user isin the driving context, that same motion may represent a “gear shifting”gesture.

Gestures may be grouped together into genre packages of complimentarygestures that are likely to be used by an application in that genre.Complimentary gestures—either complimentary as in those that arecommonly used together, or complimentary as in a change in a parameterof one will change a parameter of another—may be grouped together intogenre packages. These packages may be provided to an application, whichmay select at least one. The application may tune, or modify, theparameter of a gesture or gesture filter 191 to best fit the uniqueaspects of the application. When that parameter is tuned, a second,complimentary parameter (in the inter-dependent sense) of either thegesture or a second gesture is also tuned such that the parametersremain complimentary. Genre packages for video games may include genressuch as first-user shooter, action, driving, and sports.

FIG. 5A depicts an example skeletal mapping of a user that may begenerated from the capture device 202. In this embodiment, a variety ofjoints and bones are identified: each hand 502, each forearm 504, eachelbow 506, each bicep 508, each shoulder 510, each hip 512, each thigh514, each knee 516, each foreleg 518, each foot 520, the head 522, thetorso 524, the top 526 and bottom 528 of the spine, and the waist 530.Where more points are tracked, additional features may be identified,such as the bones and joints of the fingers or toes, or individualfeatures of the face, such as the nose and eyes.

Through moving his body, a user may create gestures. A gesture comprisesa motion or pose by a user that may be captured as image data and parsedfor meaning. A gesture may be dynamic, comprising a motion, such asmimicking throwing a ball. A gesture may be a static pose, such asholding one's crossed forearms 504 in front of his torso 524. A gesturemay be a single movement (e.g., a jump) or a continuous gesture (e.g.,driving), and may be short in duration or long in duration (e.g.,driving for 202 minutes). A gesture may also incorporate props, such asby swinging a mock sword. A gesture may comprise more than one bodypart, such as clapping the hands 502 together, or a subtler motion, suchas pursing one's lips.

A user's gestures may be used for input in a general computing context.For instance, various motions of the hands 502 or other body parts maycorrespond to common system wide tasks such as navigate up or down in ahierarchical list, open a file, close a file, and save a file. Forinstance, a user may hold his hand with the fingers pointing up and thepalm facing the capture device 202. He may then close his fingerstowards the palm to make a fist, and this could be a gesture thatindicates that the focused window in a window-based user-interfacecomputing environment should be closed. Gestures may also be used in avideo-game-specific context, depending on the game. For instance, with adriving game, various motions of the hands 502 and feet 520 maycorrespond to steering a vehicle in a direction, shifting gears,accelerating, and breaking. Thus, a gesture may indicate a wide varietyof motions that map to a displayed user representation, and in a widevariety of applications, such as video games, text editors, wordprocessing, data management, etc.

A user may generate a gesture that corresponds to walking or running, bywalking or running in place. For example, the user may alternately liftand drop each leg 512-520 to mimic walking without moving. The systemmay parse this gesture by analyzing each hip 512 and each thigh 514. Astep may be recognized when one hip-thigh angle (as measured relative toa vertical line, wherein a standing leg has a hip-thigh angle of 0°, anda forward horizontally extended leg has a hip-thigh angle of 90°)exceeds a certain threshold relative to the other thigh. A walk or runmay be recognized after some number of consecutive steps by alternatinglegs. The time between the two most recent steps may be thought of as aperiod. After some number of periods where that threshold angle is notmet, the system may determine that the walk or running gesture hasceased.

Given a “walk or run” gesture, an application may set values forparameters associated with this gesture. These parameters may includethe above threshold angle, the number of steps required to initiate awalk or run gesture, a number of periods where no step occurs to end thegesture, and a threshold period that determines whether the gesture is awalk or a run. A fast period may correspond to a run, as the user willbe moving his legs quickly, and a slower period may correspond to awalk.

A gesture may be associated with a set of default parameters at firstthat the application may override with its own parameters. In thisscenario, an application is not forced to provide parameters, but mayinstead use a set of default parameters that allow the gesture to berecognized in the absence of application-defined parameters. Informationrelated to the gesture may be stored for purposes of pre-canned gestureanimation.

There are a variety of outputs that may be associated with the gesture.There may be a baseline “yes or no” as to whether a gesture isoccurring. There also may be a confidence level, which corresponds tothe likelihood that the user's tracked movement corresponds to thegesture. This could be a linear scale that ranges over floating pointnumbers between 0 and 1, inclusive. Wherein an application receivingthis gesture information cannot accept false-positives as input, it mayuse only those recognized gestures that have a high confidence level,such as at least 0.95. Where an application must recognize everyinstance of the gesture, even at the cost of false-positives, it may usegestures that have at least a much lower confidence level, such as thosemerely greater than 0.2. The gesture may have an output for the timebetween the two most recent steps, and where only a first step has beenregistered, this may be set to a reserved value, such as −1 (since thetime between any two steps must be positive). The gesture may also havean output for the highest thigh angle reached during the most recentstep.

Another exemplary gesture is a “heel lift jump.” In this, a user maycreate the gesture by raising his heels off the ground, but keeping histoes planted. Alternatively, the user may jump into the air where hisfeet 520 leave the ground entirely. The system may parse the skeletonfor this gesture by analyzing the angle relation of the shoulders 510,hips 512 and knees 516 to see if they are in a position of alignmentequal to standing up straight. Then these points and upper 526 and lower528 spine points may be monitored for any upward acceleration. Asufficient combination of acceleration may trigger a jump gesture. Asufficient combination of acceleration with a particular gesture maysatisfy the parameters of a transition point.

Given this “heel lift jump” gesture, an application may set values forparameters associated with this gesture. The parameters may include theabove acceleration threshold, which determines how fast some combinationof the user's shoulders 510, hips 512 and knees 516 must move upward totrigger the gesture, as well as a maximum angle of alignment between theshoulders 510, hips 512 and knees 516 at which a jump may still betriggered. The outputs may comprise a confidence level, as well as theuser's body angle at the time of the jump.

Setting parameters for a gesture based on the particulars of theapplication that will receive the gesture are important in accuratelyidentifying gestures. Properly identifying gestures and the intent of auser greatly helps in creating a positive user experience.

An application may set values for parameters associated with varioustransition points to identify the points at which to use pre-cannedanimations. Transition points may be defined by various parameters, suchas the identification of a particular gesture, a velocity, an angle of atarget or object, or any combination thereof. If a transition point isdefined at least in part by the identification of a particular gesture,then properly identifying gestures assists to increase the confidencelevel that the parameters of a transition point have been met.

Another parameter to a gesture may be a distance moved. Where a user'sgestures control the actions of an avatar in a virtual environment, thatavatar may be arm's length from a ball. If the user wishes to interactwith the ball and grab it, this may require the user to extend his arm502-510 to full length while making the grab gesture. In this situation,a similar grab gesture where the user only partially extends his arm502-510 may not achieve the result of interacting with the ball.Likewise, a parameter of a transition point could be the identificationof the grab gesture, where if the user only partially extends his arm502-510, thereby not achieving the result of interacting with the ball,the user's gesture also will not meet the parameters of the transitionpoint.

A gesture or a portion thereof may have as a parameter a volume of spacein which it must occur. This volume of space may typically be expressedin relation to the body where a gesture comprises body movement. Forinstance, a football throwing gesture for a right-handed user may berecognized only in the volume of space no lower than the right shoulder510 a, and on the same side of the head 522 as the throwing arm 502a-310 a. It may not be necessary to define all bounds of a volume, suchas with this throwing gesture, where an outer bound away from the bodyis left undefined, and the volume extends out indefinitely, or to theedge of scene that is being monitored.

FIG. 5B provides further details of one exemplary embodiment of thegesture recognizer engine 190 of FIG. 2. As shown, the gesturerecognizer engine 190 may comprise at least one filter 519 to determinea gesture or gestures. A filter 519 comprises information defining agesture 526 (hereinafter referred to as a “gesture”), and may compriseat least one parameter 528, or metadata, for that gesture 526. Forinstance, a throw, which comprises motion of one of the hands frombehind the rear of the body to past the front of the body, may beimplemented as a gesture 526 comprising information representing themovement of one of the hands of the user from behind the rear of thebody to past the front of the body, as that movement would be capturedby the depth camera. Parameters 528 may then be set for that gesture526. Where the gesture 526 is a throw, a parameter 528 may be athreshold velocity that the hand has to reach, a distance the hand musttravel (either absolute, or relative to the size of the user as awhole), and a confidence rating by the recognizer engine 190 that thegesture 526 occurred. These parameters 528 for the gesture 526 may varybetween applications, between contexts of a single application, orwithin one context of one application over time.

Filters may be modular or interchangeable. In an embodiment, a filterhas a number of inputs, each of those inputs having a type, and a numberof outputs, each of those outputs having a type. In this situation, afirst filter may be replaced with a second filter that has the samenumber and types of inputs and outputs as the first filter withoutaltering any other aspect of the recognizer engine 190 architecture. Forinstance, there may be a first filter for driving that takes as inputskeletal data and outputs a confidence that the gesture 526 associatedwith the filter is occurring and an angle of steering. Where one wishesto substitute this first driving filter with a second drivingfilter—perhaps because the second driving filter is more efficient andrequires fewer processing resources—one may do so by simply replacingthe first filter with the second filter so long as the second filter hasthose same inputs and outputs—one input of skeletal data type, and twooutputs of confidence type and angle type.

A filter need not have a parameter 528. For instance, a “user height”filter that returns the user's height may not allow for any parametersthat may be tuned. An alternate “user height” filter may have tunableparameters—such as to whether to account for a user's footwear,hairstyle, headwear and posture in determining the user's height.

Inputs to a filter may comprise things such as joint data about a user'sjoint position, like angles formed by the bones that meet at the joint,RGB color data from the scene, and the rate of change of an aspect ofthe user. Outputs from a filter may comprise things such as theconfidence that a given gesture is being made, the speed at which agesture motion is made, and a time at which a gesture motion is made.

The gesture recognizer engine 190 may have a base recognizer engine 517that provides functionality to a gesture filter 519. In an embodiment,the functionality that the recognizer engine 517 implements includes aninput-over-time archive that tracks recognized gestures and other input,a Hidden Markov Model implementation (where the modeled system isassumed to be a Markov process—one where a present state encapsulatesany past state information necessary to determine a future state, so noother past state information must be maintained for this purpose—withunknown parameters, and hidden parameters are determined from theobservable data), as well as other functionality required to solveparticular instances of gesture recognition.

Filters 519 are loaded and implemented on top of the base recognizerengine 517 and can utilize services provided by the engine 517 to allfilters 519. In an embodiment, the base recognizer engine 517 processesreceived data to determine whether it meets the requirements of anyfilter 519. Since these provided services, such as parsing the input,are provided once by the base recognizer engine 517 rather than by eachfilter 519, such a service need only be processed once in a period oftime as opposed to once per filter 519 for that period, so theprocessing required to determine gestures is reduced.

An application may use the filters 519 provided by the recognizer engine190, or it may provide its own filter 519, which plugs in to the baserecognizer engine 517. Similarly, the gesture profile may plug in to thebase recognizer engine 517. In an embodiment, all filters 519 have acommon interface to enable this plug-in characteristic. Further, allfilters 519 may utilize parameters 528, so a single gesture tool asdescribed below may be used to debug and tune the entire filter system519.

These parameters 528 may be tuned for an application or a context of anapplication by a gesture tool 521. In an embodiment, the gesture tool521 comprises a plurality of sliders 523, each slider 523 correspondingto a parameter 528, as well as a pictorial representation of a body 524.As a parameter 528 is adjusted with a corresponding slider 523, the body524 may demonstrate both actions that would be recognized as the gesturewith those parameters 528 and actions that would not be recognized asthe gesture with those parameters 528, identified as such. Thisvisualization of the parameters 528 of gestures provides an effectivemeans to both debug and fine tune a gesture.

The computer executable instructions may comprise instructions forroaming a gesture profile, comprising instructions for identifying thegesture profile associated with a user, wherein the gesture profilecomprises personalized gesture information for the user, and wherein thepersonalized gesture information is derived from data captured by acapture device and representative of a user's position or motion in aphysical space; and roaming the gesture profile via a networkconnection. The instructions may further comprise instructions forreceiving a request for the gesture profile, activating the gestureprofile based on an identity of the user, and identifying the user fromprofile data.

The computer executable instructions may also comprise instructions forgesture recognition based on a user's gesture profile, includinginstructions for activating a gesture profile associated with a user,wherein the gesture profile comprises personalized gesture informationfor the user, and wherein the personalized gesture information isderived from data captured by a capture device and representative of auser's position or motion in a physical space; and recognizing a user'sgesture by comparing the received data to the personalized gestureinformation in the gesture profile.

FIG. 6 illustrates an example embodiment of a computing environment thatmay be used to interpret one or more gestures in a target recognition,analysis, and tracking system. The computing environment such as thecomputing environment 212 described above with respect to FIG. 1 may bea multimedia console 100, such as a gaming console. As shown in FIG. 6,the multimedia console 100 has a central processing unit (CPU) 101having a level 1 cache 102, a level 2 cache 104, and a flash ROM (ReadOnly Memory) 106. The level 1 cache 102 and a level 2 cache 104temporarily store data and hence reduce the number of memory accesscycles, thereby improving processing speed and throughput. The CPU 101may be provided having more than one core, and thus, additional level 1and level 2 caches 102 and 104. The flash ROM 106 may store executablecode that is loaded during an initial phase of a boot process when themultimedia console 100 is powered ON.

A graphics processing unit (GPU) 108 and a video encoder/video codec(coder/decoder) 114 form a video processing pipeline for high speed andhigh resolution graphics processing. Data is carried from the graphicsprocessing unit 108 to the video encoder/video codec 114 via a bus. Thevideo processing pipeline outputs data to an A/V (audio/video) port 140for transmission to a television or other display. A memory controller110 is connected to the GPU 108 to facilitate processor access tovarious types of memory 112, such as, but not limited to, a RAM (RandomAccess Memory).

The multimedia console 100 includes an I/O controller 2120, a systemmanagement controller 2122, an audio processing unit 2123, a networkinterface controller 2124, a first USB host controller 2126, a secondUSB controller 2128 and a front panel I/O subassembly 130 that arepreferably implemented on a module 118. The USB controllers 2126 and2128 serve as hosts for peripheral controllers 142(1)-142(2), a wirelessadapter 148, and an external memory device 146 (e.g., flash memory,external CD/DVD ROM drive, removable media, etc.). The network interface2124 and/or wireless adapter 148 provide access to a network (e.g., theInternet, home network, etc.) and may be any of a wide variety ofvarious wired or wireless adapter components including an Ethernet card,a modem, a Bluetooth module, a cable modem, and the like.

System memory 143 is provided to store application data that is loadedduring the boot process. A media drive 144 is provided and may comprisea DVD/CD drive, hard drive, or other removable media drive, etc. Themedia drive 144 may be internal or external to the multimedia console100. Application data may be accessed via the media drive 144 forexecution, playback, etc. by the multimedia console 100. The media drive144 is connected to the I/O controller 2120 via a bus, such as a SerialATA bus or other high speed connection (e.g., IEEE 1394).

The system management controller 2122 provides a variety of servicefunctions related to assuring availability of the multimedia console100. The audio processing unit 2123 and an audio codec 132 form acorresponding audio processing pipeline with high fidelity and stereoprocessing. Audio data is carried between the audio processing unit 2123and the audio codec 132 via a communication link. The audio processingpipeline outputs data to the A/V port 140 for reproduction by anexternal audio player or device having audio capabilities.

The front panel I/O subassembly 130 supports the functionality of thepower button 150 and the eject button 1npose1start1521npose1end, as wellas any LEDs (light emitting diodes) or other indicators exposed on theouter surface of the multimedia console 100. A system power supplymodule 136 provides power to the components of the multimedia console100. A fan 138 cools the circuitry within the multimedia console 100.

The CPU 101, GPU 108, memory controller 110, and various othercomponents within the multimedia console 100 are interconnected via oneor more buses, including serial and parallel buses, a memory bus, aperipheral bus, and a processor or local bus using any of a variety ofbus architectures. By way of example, such architectures can include aPeripheral Component Interconnects (PCI) bus, PCI-Express bus, etc.

When the multimedia console 100 is powered ON, application data may beloaded from the system memory 143 into memory 112 and/or caches 102, 104and executed on the CPU 101. The application may present a graphicaluser interface that provides a consistent user experience whennavigating to different media types available on the multimedia console100. In operation, applications and/or other media contained within themedia drive 144 may be launched or played from the media drive 144 toprovide additional functionalities to the multimedia console 100.

The multimedia console 100 may be operated as a standalone system bysimply connecting the system to a television or other display. In thisstandalone mode, the multimedia console 100 allows one or more users tointeract with the system, watch movies, or listen to music. However,with the integration of broadband connectivity made available throughthe network interface 2124 or the wireless adapter 148, the multimediaconsole 100 may further be operated as a participant in a larger networkcommunity.

When the multimedia console 100 is powered ON, a set amount of hardwareresources are reserved for system use by the multimedia consoleoperating system. These resources may include a reservation of memory(e.g., 16 MB), CPU and GPU cycles (e.g., 5%), networking bandwidth(e.g., 8 kbs.), etc. Because these resources are reserved at system boottime, the reserved resources do not exist from the application's view.

In particular, the memory reservation preferably is large enough tocontain the launch kernel, concurrent system applications and drivers.The CPU reservation is preferably constant such that if the reserved CPUusage is not used by the system applications, an idle thread willconsume any unused cycles.

With regard to the GPU reservation, lightweight messages generated bythe system applications (e.g., pop-ups) are displayed by using a GPUinterrupt to schedule code to render popup into an overlay. The amountof memory required for an overlay depends on the overlay area size andthe overlay preferably scales with screen resolution. Where a full userinterface is used by the concurrent system application, it is preferableto use a resolution independent of application resolution. A scaler maybe used to set this resolution such that the need to change frequencyand cause a TV resynch is eliminated.

After the multimedia console 100 boots and system resources arereserved, concurrent system applications execute to provide systemfunctionalities. The system functionalities are encapsulated in a set ofsystem applications that execute within the reserved system resourcesdescribed above. The operating system kernel identifies threads that aresystem application threads versus gaming application threads. The systemapplications are preferably scheduled to run on the CPU 101 atpredetermined times and intervals in order to provide a consistentsystem resource view to the application. The scheduling is to minimizecache disruption for the gaming application running on the console.

When a concurrent system application requires audio, audio processing isscheduled asynchronously to the gaming application due to timesensitivity. A multimedia console application manager (described below)controls the gaming application audio level (e.g., mute, attenuate) whensystem applications are active.

Input devices (e.g., controllers 142(1) and 142(2)) are shared by gamingapplications and system applications. The input devices are not reservedresources, but are to be switched between system applications and thegaming application such that each will have a focus of the device. Theapplication manager preferably controls the switching of input stream,without knowledge the gaming application's knowledge and a drivermaintains state information regarding focus switches. The cameras 26, 28and capture device 202 may define additional input devices for theconsole 100.

FIG. 7 illustrates another example embodiment of a computing environment220 that may be the computing environment 212 shown in FIG. 1 used tointerpret one or more gestures in a target recognition, analysis, andtracking system. The computing system environment 220 is only oneexample of a suitable computing environment and is not intended tosuggest any limitation as to the scope of use or functionality of thepresently disclosed subject matter. Neither should the computingenvironment 220 be interpreted as having any dependency or requirementrelating to any one or combination of components illustrated in theexemplary operating environment 220. In some embodiments the variousdepicted computing elements may include circuitry configured toinstantiate specific aspects of the present disclosure. For example, theterm circuitry used in the disclosure can include specialized hardwarecomponents configured to perform function(s) by firmware or switches. Inother examples embodiments the term circuitry can include a generalpurpose processing unit, memory, etc., configured by softwareinstructions that embody logic operable to perform function(s). Inexample embodiments where circuitry includes a combination of hardwareand software, an implementer may write source code embodying logic andthe source code can be compiled into machine readable code that can beprocessed by the general purpose processing unit. Since one skilled inthe art can appreciate that the state of the art has evolved to a pointwhere there is little difference between hardware, software, or acombination of hardware/software, the selection of hardware versussoftware to effectuate specific functions is a design choice left to animplementer. More specifically, one of skill in the art can appreciatethat a software process can be transformed into an equivalent hardwarestructure, and a hardware structure can itself be transformed into anequivalent software process. Thus, the selection of a hardwareimplementation versus a software implementation is one of design choiceand left to the implementer.

In FIG. 7, the computing environment 220 comprises a computer 241, whichtypically includes a variety of computer readable media. Computerreadable media can be any available media that can be accessed bycomputer 241 and includes both volatile and nonvolatile media, removableand non-removable media. The system memory 222 includes computer storagemedia in the form of volatile and/or nonvolatile memory such as readonly memory (ROM) 223 and random access memory (RAM) 261. A basicinput/output system 224 (BIOS), containing the basic routines that helpto transfer information between elements within computer 241, such asduring start-up, is typically stored in ROM 223. RAM 261 typicallycontains data and/or program modules that are immediately accessible toand/or presently being operated on by processing unit 259. By way ofexample, and not limitation, FIG. 7 illustrates operating system 225,application programs 226, other program modules 227, and program data228.

The computer 241 may also include other removable/non-removable,volatile/nonvolatile computer storage media. By way of example only,FIG. 7 illustrates a hard disk drive 238 that reads from or writes tonon-removable, nonvolatile magnetic media, a magnetic disk drive 239that reads from or writes to a removable, nonvolatile magnetic disk 254,and an optical disk drive 240 that reads from or writes to a removable,nonvolatile optical disk 253 such as a CD ROM or other optical media.Other removable/non-removable, volatile/nonvolatile computer storagemedia that can be used in the exemplary operating environment include,but are not limited to, magnetic tape cassettes, flash memory cards,digital versatile disks, digital video tape, solid state RAM, solidstate ROM, and the like. The hard disk drive 238 is typically connectedto the system bus 221 through an non-removable memory interface such asinterface 234, and magnetic disk drive 239 and optical disk drive 240are typically connected to the system bus 221 by a removable memoryinterface, such as interface 235.

The drives and their associated computer storage media discussed aboveand illustrated in FIG. 7, provide storage of computer readableinstructions, data structures, program modules and other data for thecomputer 241. In FIG. 7, for example, hard disk drive 238 is illustratedas storing operating system 258, application programs 257, other programmodules 256, and program data 255. Note that these components can eitherbe the same as or different from operating system 225, applicationprograms 226, other program modules 227, and program data 228. Operatingsystem 258, application programs 257, other program modules 256, andprogram data 255 are given different numbers here to illustrate that, ata minimum, they are different copies. A user may enter commands andinformation into the computer 241 through input devices such as akeyboard 251 and pointing device 252, commonly referred to as a mouse,trackball or touch pad. Other input devices (not shown) may include amicrophone, joystick, game pad, satellite dish, scanner, or the like.These and other input devices are often connected to the processing unit259 through a user input interface 236 that is coupled to the systembus, but may be connected by other interface and bus structures, such asa parallel port, game port or a universal serial bus (USB). The cameras26, 28 and capture device 202 may define additional input devices forthe console 100. A monitor 242 or other type of display device is alsoconnected to the system bus 221 via an interface, such as a videointerface 232. In addition to the monitor, computers may also includeother peripheral output devices such as speakers 244 and printer 243,which may be connected through a output peripheral interface 233.

The computer 241 may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer246. The remote computer 246 may be a personal computer, a server, arouter, a network PC, a peer device or other common network node, andtypically includes many or all of the elements described above relativeto the computer 241, although only a memory storage device 247 has beenillustrated in FIG. 7. The logical connections depicted in FIG. 2include a local area network (LAN) 245 and a wide area network (WAN)249, but may also include other networks. Such networking environmentsare commonplace in offices, enterprise-wide computer networks, intranetsand the Internet.

When used in a LAN networking environment, the computer 241 is connectedto the LAN 245 through a network interface or adapter 237. When used ina WAN networking environment, the computer 241 typically includes amodem 250 or other means for establishing communications over the WAN249, such as the Internet. The modem 250, which may be internal orexternal, may be connected to the system bus 221 via the user inputinterface 236, or other appropriate mechanism. In a networkedenvironment, program modules depicted relative to the computer 241, orportions thereof, may be stored in the remote memory storage device. Byway of example, and not limitation, FIG. 7 illustrates remoteapplication programs 248 as residing on memory device 247. It will beappreciated that the network connections shown are exemplary and othermeans of establishing a communications link between the computers may beused.

It should be understood that the configurations and/or approachesdescribed herein are exemplary in nature, and that these specificembodiments or examples are not to be considered limiting. The specificroutines or methods described herein may represent one or more of anynumber of processing strategies. As such, various acts illustrated maybe performed in the sequence illustrated, in other sequences, inparallel, or the like. Likewise, the order of the above-describedprocesses may be changed.

Furthermore, while the present disclosure has been described inconnection with the particular aspects, as illustrated in the variousfigures, it is understood that other similar aspects may be used ormodifications and additions may be made to the described aspects forperforming the same function of the present disclosure without deviatingtherefrom. The subject matter of the present disclosure includes allnovel and non-obvious combinations and sub-combinations of the variousprocesses, systems and configurations, and other features, functions,acts, and/or properties disclosed herein, as well as any and allequivalents thereof. Thus, the methods and apparatus of the disclosedembodiments, or certain aspects or portions thereof, may take the formof program code (i.e., instructions) embodied in tangible media, such asfloppy diskettes, CD-ROMs, hard drives, or any other machine-readablestorage medium. When the program code is loaded into and executed by amachine, such as a computer, the machine becomes an apparatus configuredfor practicing the disclosed embodiments.

In addition to the specific implementations explicitly set forth herein,other aspects and implementations will be apparent to those skilled inthe art from consideration of the specification disclosed herein.Therefore, the present disclosure should not be limited to any singleaspect, but rather construed in breadth and scope in accordance with theappended claims. For example, the various procedures described hereinmay be implemented with hardware or software, or a combination of both.

What is claimed:
 1. A method for gesture recognition, the methodcomprising: receiving data of a physical space, wherein the receiveddata is representative of a user and a user's gesture in the physicalspace; correlating a position of the user in the physical space to aplurality of virtual zones, wherein each of said plurality of virtualzones is representative of a respective portion of the physical space;determining a preferred detection order for the plurality of virtualzones; comparing the received data to gesture data according to anordering of sets of the gesture data, wherein (i) each of said pluralityof virtual zones is associated with a respective set of gesture data,and (ii) the ordering of the sets of gesture data is in accordance withthe preferred detection order of the plurality of virtual zones eachassociated with a respective set of gesture data; and identifying theuser's gesture from at least one of the sets of gesture data, wherein,when the user's gesture does not correspond to a set of gesture dataassociated with a most preferred virtual zone in the preferred detectionorder of the plurality of virtual zones, a set of gesture dataassociated with each next most preferred virtual zone is attempted untilthe user's gesture is identified.
 2. The method of claim 1, wherein theuser's position correlates to a virtual zone when at least one of (i)the position of the user is within the portion of the physical spacerepresented by the virtual zone, (ii) the position of the user is withina predetermined distance from the portion of the physical spacerepresented by the virtual zone, or (iii) the position of the user isadjacent to a boundary of the virtual zone.
 3. The method of claim 1,wherein the user's position correlates to first and second virtualzones, wherein the first and second virtual zones represent anoverlapping portion of the physical space and the user's position iswithin the overlapping portion of the physical space.
 4. The method ofclaim 1, wherein the preferred detection order of the plurality ofvirtual zones is determined based on a probabilistic approach.
 5. Themethod of claim 1, wherein the user's position correlates to two virtualzones when the user's position crosses a boundary between the twovirtual zones within a predetermined time period, wherein the sets ofgesture data associated with each of the two virtual zones is given asame position in the preferred detection order such that comparingreceived data to the gesture data comprises comparing the received datato both sets of gesture data.
 6. The method of claim 1, wherein thepreferred detection order of the plurality of virtual zones is an ordercorresponding to an increasing distance of the position of the user toeach of the plurality of virtual zones.
 7. The method of claim 1,wherein the preferred detection order of the plurality of virtual zonesis an order corresponding to a level of correlation between the receiveddata representative of the user's gesture and the gesture data in thesets of gesture data associated with each of the plurality of virtualzones.
 8. A computer-readable storage medium having stored thereoncomputer-executable instructions for gesture recognition, theinstructions comprising: receiving data of a physical space, wherein thereceived data is representative of a user and a user's gesture in thephysical space; correlating a position of the user in the physical spaceto a plurality of virtual zones, wherein each of said plurality ofvirtual zones is representative of a respective portion of the physicalspace; determining a preferred detection order for the plurality ofvirtual zones; comparing the received data to gesture data according toan ordering of sets of the gesture data, wherein (i) each of saidplurality of virtual zones is associated with a respective set ofgesture data, and (ii) the ordering of the sets of gesture data is inaccordance with the preferred detection order of the plurality ofvirtual zones each associated a respective set of gesture data; andidentifying the user's gesture from at least one of the sets of gesturedata, wherein, when the user's gesture does not correspond to a set ofgesture data associated with a most preferred virtual zone in thepreferred detection order of the plurality of virtual zones, a set ofgesture data associated with each next most preferred virtual zone isattempted until the user's gesture is identified.
 9. Thecomputer-readable storage medium of claim 8, wherein the user's positioncorrelates to a virtual zone when at least one of (i) the position ofthe user is within the portion of the physical space represented by thevirtual zone, (ii) the position of the user is within a predetermineddistance from the portion of the physical space represented by thevirtual zone, or (iii) the position of the user is adjacent to aboundary of the virtual zone.
 10. The computer-readable storage mediumof claim 8, wherein the user's position correlates to first and secondvirtual zones, wherein the first and second virtual zones represent anoverlapping portion of the physical space and the user's position iswithin the overlapping portion of the physical space.
 11. Thecomputer-readable storage medium of claim 8, wherein the preferreddetection order of the plurality of virtual zones is determined based ona probabilistic approach.
 12. The computer-readable storage medium ofclaim 8, wherein the user's position correlates to two virtual zoneswhen the user's position crosses a boundary between the two virtualzones within a predetermined time period, wherein the sets of gesturedata associated with each of the two virtual zones is given a sameposition in the preferred detection order such that comparing receiveddata to the gesture data comprises comparing the received data to bothsets of gesture data.
 13. The computer-readable storage medium of claim8, wherein the preferred detection order of the plurality of virtualzones is an order corresponding to an increasing distance of theposition of the user to each of the plurality of virtual zones.
 14. Thecomputer-readable storage medium of claim 8, wherein the preferreddetection order of the plurality of virtual zones is an ordercorresponding to a level of correlation between the received datarepresentative of the user's gesture and the gesture data in the sets ofgesture data associated with each of the plurality of virtual zones. 15.A system comprising: a processor; memory having stored thereincomputer-executable instructions for gesture recognition, theinstructions, when executed by the processor, at least causing:receiving data of a physical space, wherein the received data isrepresentative of a user and a user's gesture in the physical space;correlating a position of the user in the physical space to a pluralityof virtual zones, wherein each of said plurality of virtual zones isrepresentative of a respective portion of the physical space;determining a preferred detection order for the plurality of virtualzones; comparing the received data to gesture data according to anordering of sets of the gesture data, wherein (i) each of said pluralityof virtual zones is associated with a respective set of gesture data,and (ii) the ordering of the sets of gesture data is in accordance withthe preferred detection order of the plurality of virtual zones eachassociated with a respective set of gesture data; and identifying theuser's gesture from at least one of the sets of gesture data, wherein,when the user's gesture does not correspond to a set of gesture dataassociated with a most preferred virtual zone in the preferred detectionorder of the plurality of virtual zones, a set of gesture dataassociated with each next most preferred virtual zone is attempted untilthe user's gesture is identified.
 16. The system of claim 15, whereinthe user's position correlates to a virtual zone when at least one of(i) the position of the user is within the portion of the physical spacerepresented by the virtual zone, (ii) the position of the user is withina predetermined distance from the portion of the physical spacerepresented by the virtual zone, or (iii) the position of the user isadjacent to a boundary of the virtual zone.
 17. The system of claim 15,wherein the user's position correlates to first and second virtualzones, wherein the first and second virtual zones represent anoverlapping portion of the physical space and the user's position iswithin the overlapping portion of the physical space.
 18. The system ofclaim 15, wherein the preferred detection order of the plurality ofvirtual zones is determined based on a probabilistic approach.
 19. Thesystem of claim 15, wherein the user's position correlates to twovirtual zones when the user's position crosses a boundary between thetwo virtual zones within a predetermined time period, wherein the setsof gesture data associated with each of the two virtual zones is given asame position in the preferred detection order such that comparingreceived data to the gesture data comprises comparing the received datato both sets of gesture data.
 20. The system of claim 15, wherein thepreferred detection order of the plurality of virtual zones is an ordercorresponding to an increasing distance of the position of the user toeach of the plurality of virtual zones.